Abstract
Many prosocial and antisocial behaviors simultaneously impact both ourselves and others, requiring us to learn from their joint outcomes to guide future choices. However, the neurocomputational processes supporting such social learning remain unclear. Across three pre-registered studies, participants learned how choices affected both themselves and others. Computational modeling tested whether people simulate how other people value their choices or integrate self- and other-relevant information to guide choices. An integrated value framework, rather than simulation, characterizes multi-outcome social learning. People update the expected value of choices using different types of prediction errors related to the target (e.g., self, other) and valence (e.g., positive, negative). This asymmetric value update is represented in brain regions that include ventral striatum, subgenual and pregenual anterior cingulate, insula, and amygdala. These results demonstrate that distinct encoding of self- and other-relevant information guides future social behaviors across mutually beneficial, mutually costly, altruistic, and instrumentally harmful scenarios.
Introduction
Successful social living often requires considering the conflicting ways our behaviors simultaneously affect ourselves and those around us. People frequently make equitable or cooperative choices that are mutually beneficial. Other-benefitting behaviors can also be performed at a cost to the self, which is termed altruism. In contrast, antisocial behaviors are those that harm others. Sometimes, antisocial behaviors benefit the self, which is termed instrumental harm. Antisocial behaviors can also be mutually costly, with negative consequences for both oneself and others. Understanding how people acquire such behaviors is important, as prosocial behaviors improve social relationships and well-being, among other positive outcomes1,2,3,4, whereas antisocial behaviors are linked to deleterious social and functional outcomes5,6,7. But although previous research has examined social decision-making with joint outcomes at a single time point (i.e., via economic games) and has also examined how people learn to select behaviors that benefit the self or others separately, little is known about how people integrate conflicting outcomes for the self and others during learning. We therefore examined the computational and neural processes that support learning behaviors that carry simultaneous outcomes for the self and others, and test whether similar processes may generalize across different types of choices linked to joint social outcomes.
The neurocognitive architecture that allows organisms, including humans, to learn to select or avoid behaviors that carry benefits or costs for themselves has been a topic of great interest8,9,10,11. Assigning and updating expected values for available actions in a domain-general neural code has been shown to be encoded in the ventral portion of the medial prefrontal cortex12, and has been linked to learning to select actions that benefit the self11,13. This weighted update is thought to occur via prediction error signals encoded in dopaminergic neuronal firing in ventral tegmental area, which projects to ventral striatum8,14,15,16. Furthermore, people may be biased in the way they update the expected value of choices. For example, people place greater weight on outcomes that were better than expected (i.e., they weigh positive prediction errors more than negative prediction errors)17,18. People also asymmetrically update expected values of actions that benefit others versus themselves19,20,21. For example, activity in the subgenual anterior cingulate seems to specifically track information about other-benefitting but not self-benefitting prediction errors21,22. And some evidence suggests that people are more sensitive to unexpectedly positive outcomes when learning to avoid harming others23. It is not known, however, how information about self-versus-other and positive-versus-negative outcomes interact when learning how actions affect both themselves and others.
People vary widely in their propensity for prosocial versus antisocial behavior24,25. Prosocial individuals may be those who are more sensitive to information about outcomes for others, whereas antisocial individuals may be less sensitive to information about outcomes for others. For example, those who are higher in trait cognitive empathy or lower in trait psychopathy exhibit increased learning rates when learning to benefit others (i.e., they place greater weight on other-relevant prediction errors to guide future choices)20,21 and people who are less sensitive to others’ losses during learning exhibit greater antisocial behavioral tendencies26. In addition, people who place greater weight on the learned values of available actions for others (i.e., exhibit higher inverse temperatures) display increased valuation-related ventromedial prefrontal cortex activation when learning to avoid harmful outcomes for others23.
Although real-world prosocial and antisocial behaviors typically affect the self and others simultaneously, how people incorporate conflicting information about how their choices affect themselves and others (for example, that a behavior helps oneself but harms another) is poorly understood (but see Sul et al.)19. But such conflicts define many prosocial and antisocial choices, for example, donating money versus stealing money. Prosocial and antisocial choices have been traditionally studied with economic games that model decisions in a single point in time. Such decisions recruit brain regions that overlap with but appear distinct from purely self-benefitting learning, such as medial and lateral prefrontal cortex, striatum, anterior insula, and supplementary motor area27,28. However, these games cannot identify the processes by which people acquire prosocial or antisocial action preferences.
Prior work suggests two possible computational accounts by which this learning may occur. According to the first account, people maintain and update different expected values according to self- and other-relevant consequences. In other words, they update parallel value functions that track how actions affect themselves and others, respectively—a process that can be interpreted as a minimal form of value simulation29,30,31,32. This account aligns with valuing for others—tracking how actions impact others—rather than valuing from others’ perspective33. Several lines of evidence support this hypothesis. One is that simulation ability predicts generosity34,35,36. A recent study on social influence found that respondents’ choices are guided by a combination of distinct value signals from personal experience and observational learning about others’ choices29. According to this account, people additionally infer and represent the value function of other people independently from the function that is updated with self-relevant information. An alternative model is that the value of behaviors may be updated according to different prediction errors for self- versus other-relevant outcomes but then integrated into a single expected value that guides behavior. In other words, people maintain a single value function but are differentially sensitive to different types of unexpected outcomes. This account of learning would be consistent with findings from behavioral economics that the expected utility of choices can be represented as a weighted sum of self- and other-relevant outcomes37,38. Integrating information at the outcome phase of learning rather than maintaining all types of information may also be more computationally efficient, especially when someone is learning how actions affect multiple agents’ outcomes.
Here, we tested these alternative hypotheses using a novel multi-outcome social learning task, in which various choices simultaneously affected the self and another person, either in the same way (mutual benefit, mutual cost) or in conflicting ways (altruism, instrumental harm). This task extended earlier prosocial learning tasks in which choices result in single outcomes for different targets (e.g., self, other, no one)20,21. Using computational modeling and neuroimaging, we found that simultaneous self and social learning depends on integrating self- and other-relevant information into a single expected value per choice, which was updated asymmetrically based on different types of prediction errors related to the target (self, other) and valence (positive, negative). People who were less sensitive to unexpected positive and negative outcomes for others learned to make more antisocial choices and fewer prosocial choices, and were characterized by higher levels of subclinical psychopathic traits. Model-based neuroimaging revealed that brain areas previously associated with instrumental and social learning tracked these prediction errors guided by the asymmetric value update, including ventral striatum, subgenual anterior cingulate, pregenual anterior cingulate, amygdala, and anterior insula.
Results
Participants completed a task in which their choices simultaneously affected themselves and another person. In two versions of the task, participants viewed fractal images randomly positioned on the screen and selected one image. Each of four images were associated with an independent probability distribution of gaining or losing points for the self and for a study partner (Supplementary Fig. S1). Specifically, there were four categories of stimuli, although participants were not aware of this. These were: mutually beneficial (the option that most frequently yielded beneficial outcomes for both themselves and their partner), instrumentally harmful (the option that most frequently benefited themselves at a cost to their partner), altruistic (the option that most frequently benefited the partner at a cost to the self), and mutually costly (the option that most frequently yielded losses to both themselves and their partner). In the four-option task, participants chose one of four images on a given trial in three runs of 60 trials each (180 trials total). In the two-option task, participants chose one of two images on a given trial in six runs of 40 trials each (240 trials total) such that every possible pair of images was presented (4!/(2! × (4 – 2)!) = 6 runs). Thus, participants had the opportunity to choose each of the four options in three different blocks across both versions of the task. If the gain or loss did not occur, there was no change in points for that person on that trial. Each choice thus resulted in outcomes for two targets: self (gain/loss/null) and other (gain/loss/null). To model separable self- and other-relevant prediction errors at the neural level, one of the outcomes (for self or other) was randomly hidden from the participant. Therefore, on each trial, participants selected one image and then either saw whether their selection resulted in a gain, loss, or no change for themselves or alternatively saw whether their selection resulted in gain, loss, or no change for a study partner. After any trial, they could infer how their choice affected the hidden target. (Fig. 1A, B). A chi-square test of independence revealed no statistically significant difference between tasks and choice frequencies, indicating that choice behavior did not differ significantly across tasks, χ2(3, 208) = 6.37, p = 0.095. Participants rated the tasks to be relatively easy (Supplementary Information).
Task design, choice frequencies, and behavioral modeling results for the four-option task (n = 89), two-option task (n = 119), and combined sample (n = 208). A Example choice trial for the four-option task with two possible outcomes. B Example choice trial for the two-option task with two possible outcomes. C People exhibit self-regarding biases, choosing the mutually beneficial and instrumentally harmful options most frequently. Error bars represent the standard errors around the mean proportions. D, E Model comparison metrics indicate that people integrate information about outcomes for self and others during learning. Asterisks (*) indicate the best-fitting model according to the specified model comparison metric. F Mean learning rates with 95% confidence intervals depicting how people learn from different types of prediction errors (PEs). †Note that the percentages in two-option task do not sum to 100% because each option was displayed to participants in a pairwise fashion in three out of six blocks. Each of the four options was displayed to participants in 120 trials (out 240 trials total).
Model-agnostic linear mixed effects analyses predicting outcome as a function of trial, target, and trait psychopathy revealed consistent results across task paradigms. Across both task paradigms, participants learned to obtain higher rewards over trials on average. In addition, this relationship was greater for self-relevant outcomes than other-relevant outcomes. As trait psychopathy increased, the disparity between self- and other-relevant outcomes increased across trials (Supplemental Table S1 and Supplementary Fig. S2).
People integrate information about outcomes for self and others
We next tested our pre-registered computational hypotheses about how learning occurred. To do this, we fit ten different computational models to participants’ behavior, which were validated with model identifiability and parameter recovery analyses from simulated data (see Methods). Six models (2Q models; value simulation) corresponded to the first hypothesis, according to which people maintain and update separate expected values according to self- and other-relevant outcomes (represented as a vector of expected values). In other words, they simulate how others value their own choices during learning. Four models (1Q models; value integration) corresponded to the second hypothesis, according to which people maintain and update a single expected value that integrates information about both self- and other-relevant outcomes. The models within each class were nested in a straightforward manner such that they considered all combinations of parameters (Supplementary Information).
Having established the models were identifiable and parameters recoverable across both task paradigms using simulated data (see Methods), we performed Bayesian model selection on the empirical data. Across all samples, results provided robust evidence in support of behavior reflecting an algorithm that integrated self- and other-relevant information into a single expected value per choice (Model 4: 1Q–4α1β; value integration), supporting the second value integration hypothesis. Prediction errors (δ) signaled unexpected outcomes as a result of selecting an action, and, in turn, influenced the expected value (Q) of performing that action again in the future. This expected value update was weighted differently depending on both the target (e.g., self, other) and valence (e.g., positive, negative) of the prediction error. In other words, prediction errors were weighted asymmetrically to update the expected value of a choice according to the following four learning rate parameters: \({\alpha }_{{self}}^{+}\) (weighed self-relevant, positive prediction errors), \({\alpha }_{{self}}^{-}\) (weighed self-relevant, negative prediction errors), \({\alpha }_{{other}}^{+}\) (weighed other-relevant, positive prediction errors), and \({\alpha }_{{other}}^{-}\) (weighed other-relevant, negative prediction errors). Each weight quantified how much each type of prediction error influenced the expected value of a future action. The expected value of actions (Q) was also weighted by an inverse temperature parameter (β) that quantified how much the current expected value of available actions (\({{{{\rm{Q}}}}}_{{{{\rm{t}}}}}^{{{{\rm{k}}}}}\)) affects the probability of selecting an action. For all samples and tasks, this model fit the data best across various model comparison metrics, including log model evidence, integrated Bayesian Information Criteria (BIC), pseudo-R2, and protected exceedance probability (Fig. 1D). Across both studies, we did not find evidence that task difficulty predicted the best cognitive strategy (value integration vs value simulation) (Supplemental Information). To test the possibility that observed asymmetries in learning rates might reflect differences in reward sensitivity rather than learning, we fit eight additional models, which included outcome sensitivity parameters for self and other. Model fitting and model comparison provided support for the present findings that people learn less from other-relevant outcomes rather than simply being less sensitive to their rewards (Supplemental Information; Supplementary Fig. S3). With the goal of providing clarity about the behavior that the estimated parameters are capturing, we computed correlations between estimated parameters and various behavioral outcomes, including points earned for self and other and choice frequencies. In general, people who acquired more prosocial patterns were more sensitive to information about how their choices affected others (Supplementary Figs. S4–5).
On average, people exhibit self-benefiting biases
Across both tasks, participants most frequently selected the options that benefitted the self (i.e., the mutually beneficial option or the instrumentally harmful option; Fig. 1C). In the four-option task, participants chose the mutually beneficial option during 45.4% of trials on average. This was followed by the instrumentally harmful option (29.1%), the altruistic option (13.5%), and then the mutually costly option (10.9%). Participants missed 1.1% of choices on average. In the two-option task, participants chose the instrumentally harmful option in 66.1% trials on average, followed by the mutually beneficial option (63.3%), the altruistic option (35.6%), and then the mutually costly option (35.1%). Note that the percentages in two-option task do not sum to 100% because each option was displayed to participants in a pairwise fashion in three out of six blocks. Each of the four options was displayed to participants in 120 trials (out 240 trials total).
In line with our pre-registered hypothesis, this self-benefiting bias was also reflected in people’s asymmetric learning rate estimates. Across both tasks, linear mixed-effects modeling with random intercepts revealed that people exhibited higher self-relevant learning rates than other-relevant learning rates (Four-option task: coefficient = 0.82, SE = 0.17, Z = 4.96, CI95% = [.50, 1.15], p < 0.001, n = 89; Two-option task: coefficient = 2.09, SE = 0.17, Z = 12.46, p < 0.001, CI 95% = [1.76, 2.42], n = 119). Participants also exhibited a positivity bias (Four-option task: coefficient = 0.35, SE = 0.17, Z = 2.09, p = 0.036, CI 95% = [0.02, 0.67], n = 89; Two-option task: coefficient = 2.48, SE = .17, Z = 14.78, p < 0.001, CI 95% = [2.15, 2.81], n = 119). In other words, people learned more from self-relevant prediction errors and positive prediction errors. In the two-option task, we also observed a valence x target interaction effect, such that people learned most from self-relevant, positive prediction errors and least from other-relevant, negative prediction errors (Four-option task: coefficient = 0.21, SE = 0.24, Z = 0.91, p = 0.36, CI 95% = [−0.25, 0.67], n = 89; Two-option task: coefficient = −0.60, SE = 0.24, Z = −2.50, p = 0.012, CI 95% = [−1.06, −0.13], n = 119; Table 1).
Higher trait psychopathy is related to decreased sensitivity to information about how choices affect others
As predicted, participants also varied in terms of their parameter estimates (Supplementary Fig. S6). To characterize this variability, we fit a series of linear mixed-effects models to test whether the learning rate varied as a function of target, valence, and self-reported trait measures of prosocial traits (i.e., empathy, social value orientation) or antisocial traits (i.e., trait psychopathy). Despite our pre-registered hypotheses, we found no relationships between learning rates in the four-option task and measures of trait cognitive or affective empathy (Supplementary Table S2). In an exploratory analysis, we also tested whether learning rates in the two-option task varied as a function of social value orientation (SVO) and found no interaction between target and SVO (Supplementary Table S3 and Supplementary Fig. S7).
By contrast, and in line with pre-registered hypotheses, trait psychopathy was associated with learning rates in both the four-option and two-option tasks (Fig. 2). In the four-option task, we found an interaction between target and psychopathy (Target × Psychopathy: coefficient = 1.418, SE = .530, Z = 2.675, p = 0.007, CI 95% = [.38, 2.46], n = 89), such that individuals with higher psychopathy scores exhibited lower learning rates in response to other-relevant information. In the two-option task, psychopathy interacted with both valence and target relevance (Valence × Psychopathy: coefficient = 1.245, SE = 0.33, Z = 3.774, p < 0.001, CI 95% = [0.60, 1.90], n = 119; Target × Psychopathy: coefficient=1.553, SE = 0.33, Z = 4.708, p < 0.001, CI 95% = [0.91, 2.2], n = 119), such that people with higher psychopathy scores exhibited lower learning rates in response to both other-relevant and negative prediction errors (Table 2).
Plots illustrate log learning rates (y-axis) as a function of trait psychopathy (x-axis), target (colors), and valence (rows) for the four-option task (n = 89; left column) and two-option task (n = 119; right column). Ribbons indicate bootstrapped 95% confidence intervals. Trait psychopathy was computed using the mean score on the Triarchic Psychopathy Measure. Learning rates are log-transformed to account for the skewed distribution.
Psychopathy is viewed as a multidimensional construct that reflects meanness, boldness, and disinhibition39. We explored how learning varied as a function of these sub-factors. We found that trait meanness—which most consistently reflects callousness and antisocial attitudes and behavior—consistently led to worse learning about other-relevant information (Supplementary Tables S4–6 and Supplementary Figs. S8–10). In both tasks, meanness showed a significant interaction with target relevance (four-option task: coefficient = 1.054, SE = 0.426, Z = 2.476, p = 0.013, CI 95% = [0.22, 1.89], n = 89; two-option task: coefficient = 1.087, SE = 0.204, Z = 5.322, p < 0.001, CI 95% = [0.69, 1.49], n = 119).
Brain regions encoding the expected value of chosen options
To identify brain regions that encoded the expected value of options during choices in a subset of participants who completed the four-option task during neuroimaging, we tested where activation parametrically varied as a function of the expected values predicted by the best-fitting computational model at the decision phase of the task. Expected value signals were encoded in bilateral medial prefrontal cortex, bilateral anterolateral temporal cortex, and bilateral posterior cingulate. These results aligned with our pre-registered hypotheses. We also found several regions in which activation was negatively related to the expected value of choices, including bilateral anterior insula, bilateral supplementary motor area, bilateral dorsolateral prefrontal cortex, bilateral precuneus, bilateral visual cortex, bilateral cerebellum, bilateral superior and inferior parietal lobule, and bilateral dorsal striatum (Fig. 3 and Supplementary Table S7). These results suggest that a distributed network of regions integrates value-related information during social learning and decision-making.
Group-level statistical map showing brain regions encoding the expected value of choices. Statistical significance was determined using a permutation test and corrected for multiple comparisons at a cluster-level family-wise error rate (FWER) of p < 0.001 (two-sided). Colormap represents the strength (magnitude of Z statistic) and direction (sign of Z statistic) of the relationship between BOLD activation and the expected value of the chosen option.
Brain regions encoding prediction errors
To identify brain regions that tracked prediction errors, we tested where activation parametrically varied as a function of the prediction errors estimated by the best fitting computational model at the outcome phase of the task. Prediction error signals were encoded in medial prefrontal cortex, subgenual anterior cingulate, and ventral striatum. These results aligned with our pre-registered hypotheses. We also found activation in several regions that negatively correlated with prediction errors, including bilateral supplementary motor area, right insula, right middle cingulate gyrus, and left fusiform gyrus (Fig. 4 and Supplementary Table S8), suggesting these regions may play a role in error monitoring and behavioral adjustment during social learning.
Group-level statistical map showing brain regions encoding prediction errors. Statistical significance was determined using a permutation test and corrected for multiple comparisons at a cluster-level family-wise error rate (FWER) of p < 0.001 (two-sided). Colormap represents the strength (magnitude of Z statistic) and direction (sign of Z statistic) of the relationship between BOLD activation and prediction errors.
Brain regions encoding the asymmetric expected value update
Because we found that the expected value of choices was updated via four different types of prediction errors according to the best-fitting computational model, we anticipated that brain regions previously linked to prediction errors might be responsible for the asymmetric value integration at the outcome stage of learning. In other words, we tested whether different brain regions encode the updated expected value on subsequent trials according to specific prediction errors. To do this, we first estimated the degree to which activations during outcomes parametrically varied as a function of the expected value update estimated from the best-fitting computational model (i.e., \({{{{\rm{Q}}}}}_{{{{\rm{t}}}}}+{{{\alpha }}} * {{{{\delta }}}}_{{{{\rm{t}}}}}\)). We first conducted this analysis across the whole brain using an FWE corrected threshold of p < 0.001 (Supplementary Table S9). Then, for each type of value update, we extracted the mean parameter estimates from a priori regions of interest that had been previously linked to prediction error coding in self and social contexts40,41,42: ventral striatum8,14,15,16, subgenual anterior cingulate21,43, anterior insula15, amygdala41,44, and pregenual anterior cingulate43,45. We then tested whether the average encoding strength of the new expected value was different from zero using a non-parametric sign-flipping procedure with 10,000 permutations and a false discovery rate (FDR) corrected threshold of q < 0.05 across the ten regions (five regions per hemisphere).
Results showed that these regions encoded each type of prediction error to varying degrees (Fig. 5). The self-relevant, positive weighted value update was encoded in bilateral ventral striatum (left: Z = 3.59, p < 0.0001; right: Z = 3.63, p < 0.0001; n = 27), bilateral subgenual anterior cingulate (left: Z = 2.43, p = 0.003; right: Z = 2.79, p = 0.0002; n = 27), bilateral amygdala (left: Z = 3.02, p < 0.0001; right: Z = 1.99, p = 0.018; n = 27), and bilateral pregenual anterior cingulate (left: Z = 3.43, p < 0.0001; right: Z = 2.64, p = 0.003; n = 27). The self-relevant, negative weighted value update was inversely encoded in bilateral anterior insula (left: Z = −2.92, p = 0.001; right: Z = −2.4, p = 0.007; n = 27). The other-relevant, positive weighted value update was encoded in bilateral subgenual anterior cingulate (left: Z = 2.92, p < 0.0001; Z = 3.2, p = 0.0001; n = 27), bilateral pregenual anterior cingulate (left: Z = 3.55, p < 0.0001; right: Z = 2.6, p = 0.003; n = 27), and right ventral striatum (Z = 1.88, p = 0.022; n = 27). The other-relevant, negative weighted value update was encoded in right subgenual anterior cingulate (Z = 2.08, p = 0.013; n = 27). These results demonstrate distinct patterns of value update encoding across several regions, with value updates from positive prediction errors encoded more broadly compared to negative prediction errors and consistent value encoding across both self- and other-relevant learning. In an exploratory analysis, we also tested whether neural encoding in these regions of interest varied as a function of trait psychopathy. We found that the extent to which left ventral striatum encoded the value update for other-relevant positive prediction errors was inversely linked to trait psychopathy (ρ = −0.41, p = 0.034, n = 27). However, this result did not survive multiple comparison correction at FDR q < 0.05.
Mean parameter estimates indicating the extent to which regions of interest encode asymmetric expected value updates for the four different types of prediction errors. Error bars represent the 95% confidence intervals. Rows represent different regions of interest. Columns represent different types of weighted value updating. Point plot y-axes indicate the parameter estimates indexing the relationship between BOLD activation and weighted value update. Points represent the mean estimates. Dashed line represents 0. Voxel patterns indicate the weighted value update encoding relationship depicted in the point plots from blue (negative relationship) to red (positive relationship) within a given region at the depicted brain slice. All plots represent data from the neuroimaging sample that completed the four-option task (n = 27). Asterisks (*) indicate regions surviving FDR-correction using q < 0.05. (VS = Ventral striatum, SGACC = Subgenual Anterior Cingulate Cortex, AI = Anterior Insula, AMYG = Amygdala, ACC = Pregenual Anterior Cingulate Cortex).
Discussion
In three pre-registered samples of participants and two task paradigms, we examined the computational and neural mechanisms of simultaneous learning for self and others. We provide evidence that people learn to integrate self- and other-relevant information into a single value per choice. However, this value is updated asymmetrically according to distinct prediction errors, which encode information about the target (self or other) and valence (positive or negative). Participants who were more sensitive to unexpected positive and negative outcomes for others learned to make more prosocial choices and fewer antisocial choices. By contrast, as trait-level psychopathy increased, other-relevant learning rates decreased and self-relevant learning rates increased, suggesting a computational phenotype underlying the acquisition of antisocial behaviors in psychopathy.
Model-based neuroimaging analyses showed that the expected value assigned to choices corresponded to activation in medial prefrontal cortex, anterolateral temporal cortex, and posterior cingulate. Prediction errors broadly corresponded to activation in medial prefrontal cortex, ventral striatum, and subgenual anterior cingulate. Notably, different and overlapping regions encoded the way that different types of prediction errors guided the asymmetric value update. Ventral striatum and pregenual anterior cingulate guided value updating via positive prediction errors (regardless of target). Subgenual anterior cingulate guided value updating via other-relevant prediction errors (regardless of valence) and via self-relevant positive prediction errors. Anterior insula guided value updating via self-relevant, negative prediction errors. Amygdala guided value updating via self-relevant, positive prediction errors. This asymmetry in how the brain updates expected value of choices based on who is affected and whether the outcome is better or worse than expected provides a neural basis for individual differences in the behavioral biases observed (e.g., high sensitivity to self-relevant, positive prediction errors).
Beyond identifying the brain regions supporting the asymmetric integration of new information when acquiring prosocial behaviors, our results suggest that learning to select actions that help or harm both the self and others may be computationally distinct from cognitive tasks that require actively representing and maintaining how others value our actions. Instead, people integrate self- and other-relevant information during learning to guide future prosocial behaviors, suggesting that the brain combines self- and other-regarding information into a common valuation signal, rather than maintaining entirely separate valuation systems for the self and others when choosing options that have dual social outcomes. In other words, the way people distinctly encode self- and other-relevant outcomes that result from a particular behavior guides how desirable that same behavior will be in the future, regardless of whether the behavior is mutually beneficial or costly, instrumentally harmful, or altruistic.
Decisions in our task were guided by a single expected value, which was encoded by activation in medial prefrontal cortex, which is consistently found to track expected values of choices during learning and decision-making in self-relevant11,13 and social contexts19,38,46. Our findings provide a new account of its role in decision-making when learning to choose behaviors that simultaneously affect the self and another, sometimes in the same way but sometimes in ways that conflict. By including a two-option version of the task, we were able to “stress test” the modeling framework in six different social decision contexts: mutual benefit vs instrumental harm, mutual benefit vs mutual harm, mutual benefit vs altruism, instrumental harm vs altruism, instrumental harm vs mutual harm, and altruism vs mutual harm. We observed choice pattern differences between the two task versions, which may reflect changes made to the trial structure (e.g., 50% of trials with direct competition between mutually beneficial and instrumentally harmful options in the two-option task in comparison to 100% of trials in the four-option task). Responses in the two-option version of the task may also reflect differences in participants’ preferences when more than two options are not available. However, the consistent computational modeling replication across studies provides robust evidence that an integrated expected value model characterizes behavior in a generalized manner.
Updating an expected value that integrates information about self and others might be most efficient in many contexts. It is possible that more complex tasks could require updating of differential values. For example, when learning in the context of social influence, one must additionally represent the value function that guides other people’s behavior29. We also observed a large set of regions encoding negative values (i.e., increasing activation with higher negative values). This is consistent with previous meta-analytic results13, although the literature is frequently dominated by positive value encoding regions. This strong negative value encoding could be due to the strong social and, potentially, emotional context of our experimental paradigm, consistent with work demonstrating that affective context can induce shifts from positive-to-negative value coding in several brain regions47,48,49.
We found a link between trait psychopathy and learning rates, a finding consistent with prior studies suggesting atypical learning for others with increased psychopathy20 and antisocial behavior26. Our study goes beyond these prior findings by demonstrating that when an action simultaneously affects the self and someone else, people with higher psychopathy are less sensitive to unexpectedly helping or harming others (i.e., they do not reliably update the expected value of their choice as strongly, and thus their future behavior remains relatively unaffected by these outcomes for others). This effect was observed for outcomes that were both better or worse for the other person than expected, although in the two-option task, psychopathy was most strongly associated with poorer learning following unexpected harm to someone else. Such a finding demonstrates a potential basis in learning for the self-serving, uncaring behaviors that typify adults with psychopathic traits, most notably the persistent instrumental aggression that is uniquely linked to psychopathy. Unlike reactive aggression, which is emotional and spontaneous, instrumental aggression is deliberately performed to achieve a goal or reward despite the harm or cost to the victim50,51. The present results suggest a computational phenotype of instrumental aggression among people who are distinguished by their relatively blunted learning to avoid behaviors that harm others or choose behaviors that benefit them, despite preserved ability to select behaviors that benefit themselves. In exploratory analyses, we found a negative relationship between trait psychopathy and neural encoding of the value update for other-relevant positive prediction errors in left ventral striatum, but this correlation did not survive multiple comparison correction. While this finding is consistent with prior work47 and suggests that ventral striatal activation tracking other-relevant reward prediction errors is blunted among individuals with high trait psychopathy, further work will be needed to corroborate this possibility. Despite this, our work demonstrates robust external validity of our computational model across multiple social contexts and learned choices.
In general, we found that prediction errors corresponded to activation patterns in subgenual anterior cingulate, ventral striatum, and a posterior portion of the medial prefrontal cortex. However, prediction error signals were weighted differently depending on the relevant target and valence to update the future expected value of choices. We found each type of value update corresponded to activation in different regions. Subgenual anterior cingulate most robustly encoded the expected valued update pertaining to other-relevant prediction errors (regardless of valence). These findings add to accumulating literature indicating that the subgenual anterior cingulate is integral for learning in social contexts21,52. This region may encode prediction errors for more abstract discriminatory information beyond stimulus-response associations (e.g., self versus other, self versus non-self)53,54. It is important to note that in our task all decisions were embedded in a social context, because all choices carried joint social consequences even in trials when only self-relevant outcomes were displayed. Thus, participants were aware that their decisions affected their partner regardless of the outcome they saw. We implemented this study design to differentiate target-relevant prediction errors, but an important next step will be to test how our findings would generalize to situations where all target-relevant outcomes are observable. Future experimental designs in which some trials occlude all information or present all information are needed to further disentangle the role of subgenual anterior cingulate prediction error signaling in multi-outcome social learning.
We also found that valenced prediction errors are weighed asymmetrically in social learning, such that people prioritized positive over negative prediction errors. (It is also important to note that negative prediction errors could also be interpreted as positive aversive prediction errors.) This observation corroborates work showing that people update the expected value of actions asymmetrically according to better- or worse-than-expected outcomes during self-relevant learning17,18. Value updating from positive prediction errors was primarily encoded in ventral striatum, a region that consistently tracks reward prediction errors during self-relevant learning tasks14,16,55,56. The pregenual anterior cingulate also encoded this signal regardless of target, suggesting this region may play a similar role for integrating positive prediction errors and resolving unexpected signals for both the self and other42,43,57. It is debated whether the amygdala encodes valenced prediction errors (per se). For example, some work suggests that it instead encodes the extent to which a choice has previously been accompanied by an unsigned prediction error40,58,59. Our findings shed some light on this debate, indicating that the signal encoded by amygdala plays a unique role in value updating from self-relevant positive prediction errors. Future work will be needed to assess whether the amygdala response related to the value update is due to the valence or magnitude of the prediction error. In contrast, value updating from self-relevant negative prediction errors seemed to be inversely encoded in anterior insula. These finding extends initial work showing that anterior insula encodes prediction errors during self-relevant aversive learning15 and suggest this region could play a role in detecting and correcting negative prediction errors during social learning such that its activation decreases as the prediction error is resolved.
Despite important differences in the tasks, our analyses revealed that the best-fitting computational model is similar to those reported from studies investigating learning for others alone20,21,60. In these studies, participants also weighted prediction errors differently for self than for others. Our results thus replicate that learning to increase one’s own welfare can be computationally distinguished from learning to increase others’ welfare. It is possible, however, that different results might be found in a task in which participants or others are at risk of physical harm (e.g., an electric shock) rather than at financial risk23. Future work should investigate the computations required for multi-outcome social learning in other contexts unrelated to monetary outcomes. While this work advances our current understanding of prosocial and antisocial learning across different contexts, an important next step will be to examine the interaction between learning behavior and various social preferences (e.g., inequity aversion)61. Although our task was not well-suited to answer this question, future work can consider how asymmetries in social utility affect learning for other people.
Together, this work provides a neurocomputational account of how people learn for others when required to consider how a single choice will affect themselves and someone else. These findings extend our understanding of the computational substrates of the types of prosocial and antisocial learning that occur frequently in daily life: learning behaviors that benefit others at a cost to the self, and learning behaviors that benefit the self at a cost to others. They suggest that learning about the consequences of our actions for oneself and others over time cannot as easily be teased apart as the correlates of value-based decision-making for self and others in a single moment27,28. Because value signals cannot be disentangled at the level of decisions, but only at the level of outcomes, our findings show that learning to select actions that can help or harm both the self and others does not require the active representation and maintenance of another person’s value function separate from one’s own. They also support the utility of computational modeling in understanding real-world variance in socioemotional traits, suggesting that the development of antisocial traits may reflect impaired learning when decisions have consequences not just for the self but also for other people.
Methods
Pre-registrations and deviations
Experimental procedures, including hypotheses, simulations, and empirical data collection, were pre-registered on the Open Science Framework (https://osf.io/zt269/). We submitted two pre-registrations for each of the versions of the tasks. Both pre-registrations described the sample size targets based on power analyses, inclusion/exclusion criteria, task design, and model identifiability and parameter recovery results. Both also described the alternative hypothesis comparison using computational model comparison (1Q/value integration vs 2Q/value simulation). They also both hypothesized that self-relevant learning rates will be greater than other-relevant learning rates, and higher trait psychopathy will be linked to lower other-relevant learning rates. Pre-registration 1 with the four-option task did not include models 2, 4, 9, and 10 (Supplementary Information), but the second pre-registration with the two-option included all 10 models. Models 11–18 were added following a reviewer request and were not included in the pre-registrations. In pre-registration 1, we additionally hypothesized that higher trait empathy will be linked to higher other-relevant learning rates, medial prefrontal cortex will encode expected value during decisions, ventral striatum activation will encode prediction errors for both self and other, and subgenual anterior cingulate will encode prediction errors for other. Pre-registration 1 also pre-registered fMRI whole-brain analyses using the four-option task. The regions of interest were also included but the exact analysis details pertaining to the four different types of prediction errors guiding the value update deviated from the original plan because we did not predict the best model would include four prediction errors. We also conducted exploratory analyses to examine the relationship between learning rates and social value orientation and the relationship between neural encoding and trait psychopathy.
Experimental paradigm
The experimental paradigm was a probabilistic social reinforcement learning task adapted from Lockwood et al.21, in which agents learn to select options that optimize outcomes (i.e., earn rewards and avoid punishments) for themselves and a partner.
Four-option task
On each trial, four stimuli (e.g., fractal images) were presented—each associated with independent probability distributions for self outcomes and other outcomes. Agents selected one stimulus on a given trial and then received feedback if their selection resulted in a gain, loss, or no change to themselves and whether their selection resulted in gain, loss, or no change to a study partner. Each choice thus resulted in two outcomes: self and other. The outcome probabilities were held constant throughout the task and set to 75% (high), 15% (low), and 10% (low) for each of the three possible outcomes independently for self and other. These probabilities were chosen to elicit variable prediction errors across trials in each run (i.e., such that the most optimal choice would sometimes yield unexpected outcomes). As the optimal choice, the mutually beneficial stimulus is set to have high probability of gain for self and gain for other, resulting in a joint probability of that optimal outcome 56.2% percent of the time (75% self gain ×75% other gain). Thus, because self- and other-outcome probabilities were set independently for each stimulus, a joint probability of each outcome occurring for self and other can be calculated for each stimulus (Supplementary Fig. S1).
The task was split into three independent runs, each containing a different set of stimuli. Each run was pseudo-randomly ordered and contained 60 trials (180 trials total). For each agent, stimuli within runs were independently assigned one of the four possible payout matrices (i.e., mutually beneficial, instrumentally harmful, altruistic, mutually costly), and outcome orders were randomly shuffled. The four stimuli were also randomly positioned on the screen across trials and selected via a corresponding button press. Importantly, one of the outcomes for self and other was randomly hidden to the agent, which allowed modeling of separable prediction errors for self and other outcomes. In other words, on half of the trials, only the self outcome was displayed to the agent; on the other half of trials, only the other outcome was displayed to the agent. For the neuroimaging task, the inter-stimulus intervals were jittered in time using custom code to optimize design efficiency.
Two-option task
To test whether our initial observations were due to task complexity, we developed a variation of the original paradigm so that agents only choose between two options on a given trial, instead of four options on a given trial in the original paradigm. This new paradigm comprised six blocks of 40 trials (240 trials total). Each block thus resembled a classic two-armed bandit task21,62. The options in each block resembled the same types of options in the original paradigm: mutually beneficial (the option that most frequently yielded beneficial outcomes for both themselves and their partner), instrumentally harmful (the option that most frequently benefited themselves at a cost to their partner), altruistic (the option that most frequently benefited the partner at a cost to the self), and mutually costly (the option that most frequently yielded losses to both themselves and their partner). These options retained the same independent probability distributions for self- and other-relevant outcomes as in the original paradigm (Supplementary Fig. S1). These options were organized in a pairwise fashion into six blocks (4!/(2! × (4 – 2)!) = 6) such that agents choose between every combination of option pairs. These blocks were randomized for each agent such that no agent completed blocks in the same order.
On each trial, participants chose one of the two options, which appeared on the left and right side of the screen. Positions of the options were randomly counterbalanced within blocks. Then, participants observed the outcomes. Each choice resulted in two outcomes: self (gain/loss/null) and other (gain/loss/null). To mimic the original paradigm, one of the outcomes (for self or other) was always randomly hidden from the participant. Therefore, on each trial, participants selected one option and then either saw whether their selection resulted in a gain, loss, or no change for themselves or alternatively saw whether their selection resulted in gain, loss, or no change for a study partner. This task only differed from the original in that agents chose between two options on a given trial, and they completed this task in six blocks of 40 trials.
Computational modeling
Models
We constructed models based on reinforcement learning theory that assumed that individuals updated an expected value Q for each stimulus k based on Rescorla-Wagner learning10. According to this learning rule, the future expected value \({Q}_{t+1}^{k}\) as a function of the current expected value \({Q}_{t}^{k}\) plus a prediction error \({\delta }_{t}\), which is the actual outcome rt minus the expected value \({Q}_{t}^{k}\). The model also specifies that individuals make choices with probability p on trial t according to a softmax function. Prediction errors are weighted by the learning rate (α), which quantifies the extent to which prediction errors influence the expected value (Q) of a future choice (k). A higher learning rate parameter (α) indicates that an agent is more sensitive to unexpected gains or losses. The expected value of actions is weighted by the inverse temperature (β), which quantifies how much the expected value of available actions affects the probability of selecting a given action. A lower inverse temperature parameter (β) indicates that an agent selects actions more randomly (in other words, they explore different actions more often rather than consistently choosing the action with the highest known value). Each model also specified that agents made choices with probability p on trial t according to a softmax function. All models included at least one inverse temperature (β) parameter and at least one learning rate (α) parameter. See Supplementary Information for detailed descriptions of each model.
The best-fitting model was a variant temporal difference model with five free parameters (α+self, α+other, α−self, α−other, β) that assumed that agents integrated self- and other-relevant information into a single expected value per choice, but updated this value based on different types of prediction errors related to the target (e.g., self, other) and valence (e.g., positive, negative). They also weighed the expected values of choices by a single inverse temperature.
Expected value update:
Prediction error:
Learning rates:
Choice rule (softmax):
Model fitting
To fit the variations of the learning model to (real and simulated) data we used an iterative maximum a posteriori (MAP) approach with expectation maximization63,64. This method provides a better estimation than a single-step maximum likelihood estimation (MLE) alone by being less susceptible to the influence of outliers. It does this via implementing two levels: the lower level of the individual subjects and the higher-level reflecting the full sample. For the MAP procedure, we initialized group-level Gaussians as uninformative priors with means of 0.1 (plus some added noise) and variance of 100. During the expectation step, we estimated the model parameters (α and β) for each participant using an MLE approach calculating the log-likelihood of the subject’s series of choices given the model. We then computed the maximum posterior probability estimate, given the observed choices and given the prior computed from the group-level Gaussian, and recomputed the Gaussian distribution over parameters during the maximization step. We repeated expectation and maximization steps iteratively until convergence of the posterior likelihood summed over the group, or a maximum of 800 steps. Convergence was defined as a change in posterior likelihood <0.001 from one iteration to the next. Note that bounded free parameters were transformed from the Gaussian space into the native model space via appropriate link functions (e.g., a sigmoid function in the case of the learning rate) to ensure accurate parameter estimation near the bounds.
Model comparison
For model comparison, we calculated the Laplace approximation of the log model evidence (LME; more positive values indicating better model fit)65, the integrated Bayesian Information Criterion (BIC; lower is better)63,64, and the pseudo-R2 as an additional measure of model fit by extracting the choice probabilities generated for each agent on each trial from the winning model and computing the squared median choice probability across agents.
Simulation experiments
Using these ten models, we used simulations to test whether our model comparison procedure could correctly distinguish between the models and the winning model could correctly estimate parameter values, under ideal conditions of simulated data66,67,68.
Model identifiability
We simulated data from all ten models to establish that we could accurately identify the best model across a wide range of parameter values. For this model identifiability analysis, we simulated data from 150 agents, drawing parameters from distributions commonly used in the reinforcement learning literature20,69,70. Prediction error learning rates (α) were drawn from a beta distribution (betapdf(parameter,1.1,1.1)) and softmax temperature parameters (β) from a gamma distribution (gampdf(parameter,1.2,5)). We fitted each of the ten simulated datasets to each of the ten models using the iterative MAP approach described above and repeated the procedure 10 times. Across the 10 runs, we demonstrate that the models are identifiable using our model comparison process by plotting confusion matrices of how many times each model won (Supplementary Fig. S11–12).
Parameter recovery
To assess the reliability of this model and the interpretability of the free parameters, we performed parameter recovery on simulated choices for all models using the trial schedules from the task. We then refit the simulated choices using the same MAP process described above. We found strong Pearson’s r correlations between the true simulated and estimated parameter values, suggesting our experiments were well suited to estimate the models’ parameters (Supplementary Fig. S13–14).
Participants
We collected data from three independent, pre-registered samples of participants (N = 220). Two in-person samples completed a version of the task in which participants chose one of four options on each trial (i.e., four-option task; N = 90): a behavior-only sample and neuroimaging sample. To confirm that findings were not due to task complexity, a third sample completed an online version of the task in which participants chose one of two options on each trial (i.e., two-option task; N = 130).
Four-option task
The pre-registered sample sizes for the four-option version of the task were determined from a power analysis based on the results reported in Lockwood et al.21, which used a similar task but with only one positive or neutral outcome per trial. This showed an effect size of d = 0.87 when comparing the learning rate for self-relevant learning with those for prosocial learning. Using the G*power software version 3.1.9.1, we determined that a sample size of 20 participants would allow us to detect the same magnitude of effect with >95% power and 5% false-positive rate (two-tailed). This was the minimum sample size determined for both studies. However, we selected and pre-registered samples larger than n = 20 to account for possible attrition. We recruited participants using the Georgetown University Research Volunteer Program, Research Match, and flyering in the local community. We collected data from two independent samples of participants (N = 90): a behavior-only sample and a neuroimaging sample. Our pre-registration specified the following inclusion criteria: between 18 and 35 years old, no previous or current neurological disorder or neural injury, English as primary language, normal or corrected hearing and vision, no MRI safety contraindications for the neuroimaging sample, should not select the mutually costly option most frequently, and have no more than >15% motion-outliers (indexed as >0.5 mm framewise displacement). Only one participant in the behavior-only sample did not meet our pre-registered criteria for inclusion (i.e., this participant selected mutually costly option most frequently and also self-reported that they chose stimuli “at random”). Despite this, our pre-registered sample size goals were exceeded in the behavior-only sample (N = 62; 18–29 years; Mean Age = 21.31 years; SD = 2.45 years; 59.7% Female) and neuroimaging sample (N = 27; 19–29 years; Mean Age = 21.96 years; SD = 3.02 years; 48.1% Female). These samples were matched in terms of sex (Welch’s t = .81, p = 0.42) and age (Welch’s t = 0.99, p = 0.33) (N = 89; 18–29 years; Mean Age = 21.5 years; SD = 2.63 years; 56.1% Female).
Two-option task
The pre-registered sample size for the two-option task was determined from a power analysis based on the results from the four-option task. We conducted a power analysis based on the results from the four-option task, which revealed an effect size of f = 0.26 when testing the interaction effect between target and psychopathy scores on learning rates. Using the G*power software version 3.1.9.7, we determined that a sample size of 119 participants would allow us to detect the same magnitude of effect with >80% power (1-β) and 5% false-positive rate (α). This was the minimum sample size determined for both studies. However, we selected and pre-registered samples larger than 120 to account for possible attrition. We recruited 130 participants using the Georgetown University Research Volunteer Program. Our pre-registration specified the following inclusion criteria: between 18 and 35 years old, English as primary language, normal or corrected hearing and vision, should not select the mutually costly option most frequently, and respond to two out of three (67.7%) attention checks included in the questionnaire. Eleven participants in this sample did not meet our pre-registered criteria for inclusion—i.e., these participants selected the mutually costly option most frequently. This sample size thus met our pre-registration goal (N = 119; 18–31 years; Mean Age = 22.04 years; SD = 2.85 years; 42.9% Female). The two-option task sample was matched to the four-option task sample in terms of both sex (Welch’s t = −1.91, p = 0.06) and age (Welch’s t = −1.40, p = 0.16).
All participants provided written informed consent to participate in this study, which was approved by the Institutional Review Board at Georgetown University.
Experimental procedure
Behavioral response data during the completion of the learning task were collected from two independent samples of participants: behavior-only and neuroimaging. Participants completed the four-option task in three blocks (60 trials per block, 180 trials total), with each block containing a different set of four stimuli. Participants completed the two-option task in six blocks (40 trials per block, 240 trials total), with each block containing a different set of four stimuli. Blocks were counterbalanced and randomly ordered across participants.
Behavior-only data collection
Each version of the task was completed using arrow keys on a computer keyboard via PsychoPy71 in-person (four-option task) and online via Pavlovia.org (two-option task). Participants were instructed that their choices would affect an unseen participant who is taking part in a separate study, who would not complete the same task, and who they would not meet. Participants were instructed that they could learn by trial and error and would receive feedback that showed the outcome either for themselves or the study partner. However, they were not aware of the categories of stimuli (mutually beneficial, mutually costly, altruistic, instrumentally harmful). Prior to starting the task, they were presented with standardized instructions and then asked to complete the following True/False comprehension check questions (Supplemental Table S10–11). After answering each question, participants were provided with feedback regarding their answer (i.e., either “correct” or “incorrect” with the correct answer displayed) in order to ensure comprehension of the task instructions (Supplementary Information).
Neuroimaging data collection
The neuroimaging experiment took place as part of a larger study during a 2-h and 30-min appointment at Georgetown University Center for Functional and Molecular Imaging. Prior to beginning the scan, participants met their study partner (female). This partner was consistent for all participants. The partner sat down at a desk roughly five meters away from the doorway. Participants were introduced briefly and did not enter the room. Then, they were brought to a different room to complete an MRI safety screen and consent form. Prior to starting the scan, participants received standardized instructions that described the structure of the task. Importantly, the instructions explained that their choices were linked to real-world monetary outcomes: points earned for themselves or their study partner would be tallied and converted to actual money at the end of the study and actually realized. During the first 30 min of scanning, participants completed a field map, three runs of the learning task, and then an anatomical scan. For the remainder of the scan (~30 min), the participants listened to a narrative (not discussed in the present study). Prior to beginning the first run of the learning task, participants were read standardized reminder instructions (Supplemental Information). On a given trial, agents saw the options (randomly positioned) and had 2000 ms to select one. After 2000 ms, a red box appeared around the selected stimulus for 500 ms followed by a fixation cross jittered with a mean of 1500 ms. Then, feedback appeared on the screen for 2000 ms. At random, the outcomes were only displayed for self (“YOU”; self) or partner (“PARTNER”; other). Inter-trial intervals displayed a fixation cross jittered with a mean of 2000 ms before a new choice screen is displayed. Each run of the learning task lasted 8 min and 21 s and consisted of the trial structure described above. The stimuli were displayed using PsychoPy 2020.1.171.
Post-task questionnaires
Participants also completed a battery of self-report questionnaires. These included individual differences measures of the following: A 31-item measure of cognitive and affective empathy called the Questionnaire of Cognitive and Affective Empathy (QCAE)72 was collected after the four-option task, following previous work demonstrating the relationship between trait empathy and (1) learning rates when learning to gain rewards for others21 and (2) sensitivity to subjective values of choice alternatives (e.g., inverse temperature) when learning to avoid punishments for others23, a 6-item measure of social value orientation (SVO)73 was collected after the two-option task, following previous work demonstrating that people vary in their motivations when allocating economic resources between themselves and others, and a 58-item measure psychopathy, including its three component factors, using the Triarchic Psychopathy Measure (TriPM)39 after both tasks, following previous work demonstrating the relationship between antisociality and sensitivity to others’ losses and prediction errors when learning for others20,26. This psychopathy measure is consistent, reliable, and valid in comparison to other clinically-assessed psychopathy measures74,75. Participants also completed a battery of demographic questions (e.g., age, sex, race/ethnicity, socioeconomic status).
MRI data acquisition
Scans were collected using a 3T Siemens Prisma system at the Georgetown University Center for Functional and Molecular Imaging. After an initial localization scan, a double-echo field map (phase encoding direction = A » P) was collected followed by three runs of the learning task. We applied an orbitofrontal tilt from the AC-PC line to mitigate signal dropout in the medial prefrontal cortex76,77. For each run of the task, 501 functional volumes were acquired using a multiband T2*-sensitive echo-planar imaging (EPI) pulse sequence (TR = 1000 ms, TE = 24 ms, voxel size = 3.0 mm3, flip angle = 60°, bandwidth = 2646 Hz/pixel, echo spacing = 0.49 ms, matrix size = 70 × 70, slices = 56, field of view = 210 mm × 210 mm × 168 mm, multiband factor = 4). Prior to beginning each run, we automatically dropped the first 9 nonsteady state volumes for each EPI run, leaving 492 volumes for each run. Then, we collected a high-resolution T1-weighted structural scan (TR = 2300 ms, TE = 2.99 ms, voxel size = 1.0 mm3, flip angle = 9°, matrix: 288 mm × 288 mm × 176 mm). After acquiring the MPRAGE, participants completed two runs of naturalistic listening using the same EPI parameters above with longer acquisitions (1192 volumes and 681 volumes, respectively) (not discussed in the present study).
MRI preprocessing
All preprocessing was implemented using fMRIPrep78, which included anatomical T1-weighted brain extraction, anatomical surface extraction, head-motion estimation and correction, susceptibility-derived distortion estimation and unwarping, intrasubject registration, and spatial normalization (intersubject registration). All volumes were registered to the MNI (Montreal Neurological Institute) standard template. The resulting data was smoothed using a 6.0 mm3 FWHM Gaussian kernel and the time series was scaled to have a mean of 100 prior to first-level statistical modeling.
Analysis of brain and behavior
Computational modeling
We fit each of the ten models using the MAP approach described above63,64 and assessed which model fits the data best using the log model evidence, integrated BIC, and pseudo-R2. We conducted this analysis separately for each of our two samples (behavior only and neuroimaging) in order to derive trial-level components (e.g., expected values, prediction errors) as well as individual-level components (e.g., learning rate, inverse temperatures). Across the full sample, we used a random-effects family-level inference approach79,80 to compute protected exceedance probabilities. This method involves partitioning the model space into families that share common features and assesses the probability that one model family is more likely than the other, considering the observed data and accounting for inter-subject variability. We partitioned the model space into two families (1Q/value integration and 2Q/value simulation) to test which strategy was predominantly utilized.
Individual differences
To characterize variability in sensitivity to different types of prediction errors, we employed linear mixed-effects modeling with random intercepts using statsmodels in Python. Specifically, we assessed how learning rate parameters varied as a function of target (self > other) and valence (positive > negative). Fixed effects were estimated by coding target and valence as indicator variables predicting learning rate. We also assessed how learning rates change as a function of cognitive and affective empathy (four-option task) and trait psychopathy (both tasks), including three sub-factors of trait psychopathy (meanness, boldness, disinhibition). In an exploratory analysis, we also tested how learning rates in the two-option task varied as a function of social value orientation (SVO). Statistical significance for all statistical modeling was determined using a two-sided p < 0.05.
BOLD activation
To estimate BOLD activation that tracks the updating value of options during decisions and prediction errors during outcomes, we performed temporal data reduction using a standard first-level General Linear Modeling (GLM) approach. Based on the computational model that best explained behavior, we first constructed a GLM for each participant using regressors convolved with a canonical hemodynamic response function for all decision trials, a parametric modulator of decisions by expected values (Q from model), all outcome trials, a parametric modulator of outcomes by prediction errors (δ). Thus, these regressors were modeled against fixation as the implicit baseline. We also included regressors for confounding variables generated from fMRIPrep: three constant regressors for each of the three runs of acquisition, two regressors for global signals extracted from within the cerebrospinal fluid and white matter, three translational motion parameters, three rotational motion parameters, and a set of cosine basis functions up to a cutoff of 128 s. For relevant participants, we also added regressors for missed choices/outcomes and censored volumes with motion greater than our pre-registered threshold (framewise displacement > 0.5 mm). This yielded two maps of interest corresponding to value-based decisions and prediction errors. These maps were then used for group-level non-parametric analyses with 10,000 permutations to identify regions in which activation tracked the expected values that participants assigned to choices and prediction errors across the whole brain. We thresholded resulting whole-brain maps using FWE correction at p < 0.001.
We additionally aimed to estimate BOLD activation that tracked the asymmetric value integration at the outcome stage of learning—i.e., activation encoding the updated expected value on subsequent trials according to each type of prediction error. To do this, we constructed additional GLMs for each participant using the same regressors as before. However, instead of a single parametric modulator of outcomes by prediction errors, we added parametric modulators of outcomes by value updates (i.e., \({Q}_{t}+\alpha * {\delta }_{t}\)) corresponding to self-relevant, positive prediction errors, self-relevant, negative prediction errors, other-relevant, positive prediction errors, and other-relevant, negative prediction errors. For each type of value update, we extracted the mean parameter estimates from a priori regions of interest that had been previously linked to prediction error coding: ventral striatum, subgenual anterior cingulate, anterior insula, amygdala, and pregenual anterior cingulate. We then tested whether the encoding strength was different from zero using level group-level non-parametric analyses with 10,000 permutations to estimate a two-side p-value and an FDR-corrected threshold of q < 0.05 across the ten regions (five regions per hemisphere). We also conducted this analysis across the whole brain using FWE correction at p < 0.001. These analyses focused on identifying brain regions that encode value updates from different types of prediction errors rather than directly comparing the relative strength of encoding between different prediction error types. To explore differences in encoding strength across the regions of interest, we also conducted a series of mixed-effects model analyses (Supplementary Information).
Software
In AFNI81, we conducted first-level analyses using 3dDeconvolve and second-level analyses using 3dttest++. Anatomical labeling was carried out using bspmview and the anatomy toolbox labeling scheme82. Plotting of behavioral data was carried out using matplotlib83 and seaborn84. Plotting of neural data was carried out using Nilearn version 0.8.185.
Regions of interest specification
ROIs were pre-registered and based upon previously published brain atlas parcellations. We used the Harvard-Oxford atlas for ventral striatum and amygdala86,87,88,89. We used the automated anatomical labeling (AAL) atlas for anterior insula90 and constrained voxels to be anterior to y = 091,92. We used the subgenual and pregenual anterior cingulate mask derived from cytoarchitecture and meta-analytic functional connectivity modeling from Palomero-Gallagher et al.93,94 and the JuBrain Anatomy Toolbox95.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The processed data, including the thresholded and unthresholded brain maps, are available on the Open Science Framework (https://osf.io/2gnv4/).
Code availability
Computer code is available on the Open Science Framework (https://osf.io/2gnv4/). The Python implementation of the expectation maximization algorithm with maximum a posteriori (MAP) estimation is also available via the pyEM package (https://github.com/shawnrhoads/pyEM)96.
References
Aknin, L. B., Whillians, A. V., Norton, M. I. & Dunn, E. W. Happiness and prosocial behavior: an evaluation of the evidence. in World Happiness Report 2019 (eds. Helliwell, J. F., Layard, R. & Sachs, J. D.) 67–85 (World Happiness Report, 2019).
de Waal, F. B. M. Putting the altruism back into altruism: the evolution of empathy. Annu. Rev. Psychol. 59, 279–300 (2008).
Fehr, E., Rockenbach, B., Borst, A. & Schultz, W. Human altruism: economic, neural, and evolutionary perspectives. Curr. Opin. Neurobiol. 14, 784–790 (2004).
Rhoads, S. A. & Marsh, A. A. Doing good and feeling good: relationships between altruism and well-being for altruists, beneficiaries, and observers. in World Happiness Report (eds. Helliwell, J. F. et al.) 103–130 (United Nations Sustainable Development Solutions Network, 2023).
Neumann, C. S. & Hare, R. D. Psychopathic traits in a large community sample: links to violence, alcohol use, and intelligence. J. Consult. Clin. Psychol. 76, 893 (2008).
Yu, R., Geddes, J. R. & Fazel, S. Personality disorders, violence, and antisocial behavior: a systematic review and meta-regression analysis. J. Pers. Disord. 227, 481–491 (2012).
Pauli, R. & Lockwood, P. L. The computational psychiatry of antisocial behaviour and psychopathy. Neurosci. Biobehav. Rev. 145, 104995 (2022).
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Schultz, W. Dopamine reward prediction error coding. Dialogues Clin. Neurosci. 18, 23–32 (2016).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction. (The MIT Press, 2018).
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).
Levy, D. J. & Glimcher, P. W. The root of all value: a neural common currency for choice. Curr. Opin. Neurobiol. 22, 1027–1038 (2012).
Bartra, O., McGuire, J. T. & Kable, J. W. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage 76, 412–427 (2013).
Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J. & Frith, C. D. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442, 1042–1045 (2006).
Seymour, B. et al. Temporal difference models describe higher order learning in humans. Nature 429, 664–667 (2004).
D’Ardenne, K. et al. BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science 319, 1264–1267 (2008).
Niv, Y., Edlund, J. A., Dayan, P. & Doherty, J. P. O. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. J. Neurosci. 32, 551–562 (2012).
Rosenbaum, G. M., Grassie, H. L. & Hartley, C. A. Valence biases in reinforcement learning shift across adolescence and modulate subsequent memory. Elife 11, e64620 (2022).
Sul, S. et al. Spatial gradient in value representation along the medial prefrontal cortex reflects individual differences in prosociality. Proc. Natl. Acad. Sci. USA 112, 7851–7856 (2015).
Cutler, J., Wittmann, M., Abdurahman, A. & Hargitai, L. Ageing disrupts reinforcement learning whilst learning to help others is preserved. Nat. Commun. 12, 4440 (2021).
Lockwood, P. L., Apps, M. A. J., Valton, V., Viding, E. & Roiser, J. P. Neurocomputational mechanisms of prosocial learning and links to empathy. Proc. Natl. Acad. Sci. USA 113, 9763–9768 (2016).
Lockwood, P. L., Apps, M. A. J., Roiser, J. P. & Viding, E. Encoding of vicarious reward prediction in anterior cingulate cortex and relationship with trait empathy. J. Neurosci. 35, 13720–13727 (2015).
Lengersdorff, L. L., Wagner, I. C., Lockwood, P. L. & Lamm, C. When implicit prosociality trumps selfishness: the neural valuation system underpins more optimal choices when learning to avoid harm to others than to oneself. J. Neurosci. 40, 7286–7299 (2020).
Marsh, A. A. The caring continuum: evolved hormonal and proximal mechanisms explain prosocial and antisocial extremes. Annu. Rev. Psychol. 70, 20–25 (2019).
Thielmann, I., Spadaro, G. & Balliet, D. Personality and prosocial behavior: a theoretical framework and meta-analysis. Psychol. Bull. 146, 30–90 (2020).
O’Connell, K., Walsh, M., Padgett, B., Connell, S. & Marsh, A. A. Modeling variation in empathic sensitivity using go/no-go social reinforcement learning. Affect. Sci. https://doi.org/10.1007/s42761-022-00119-4 (2022).
Cutler, J. & Campbell-Meiklejohn, D. A comparative fMRI meta-analysis of altruistic and strategic decisions to give. Neuroimage 184, 227–241 (2019).
Rhoads, S. A., Cutler, J. & Marsh, A. A. A feature-based network analysis and fMRI meta-analysis reveal three distinct types of prosocial decisions. Soc. Cogn. Affect. Neurosci. https://doi.org/10.1093/scan/nsab079 (2021).
Zhang, L. & Gläscher, J. A brain network supporting social influences in human decision-making. Sci. Adv. 6, 1–19 (2019).
Hill, M. R., Boorman, E. D. & Fried, I. Observational learning computations in neurons of the human anterior cingulate cortex. Nat. Commun. 7, 12722 (2016).
Charpentier, C. J., Iigaya, K. & O’Doherty, J. P. A neuro-computational account of arbitration between choice imitation and goal emulation during human observational learning. Neuron 1–38 https://doi.org/10.1016/j.neuron.2020.02.028 (2020).
Suzuki, S. et al. Learning to simulate others’ decisions. Neuron 74, 1125–1137 (2012).
Ruff, C. C. & Fehr, E. The neurobiology of rewards and values in social decision making. Nat. Rev. Neurosci. 15, 549–562 (2014).
Waytz, A., Zaki, J. & Mitchell, J. P. Response of dorsomedial prefrontal cortex predicts altruistic behavior. J. Neurosci. 32, 7646–7650 (2012).
Bas, L. M., Roberts, I. D., Hutcherson, C. A. & Tusche, A. A neurocomputational account of the link between social perception and social action. Elife 12, RP92539 (2023).
Gaesser, B. & Schacter, D. L. Episodic simulation and episodic memory can increase intentions to help others. Proc. Natl. Acad. Sci. USA 111, 4415–4420 (2014).
Konovalov, A., Hu, J. & Ruff, C. C. Neurocomputational approaches to social behavior. Curr. Opin. Psychol. 24, 41–47 (2018).
Rilling, J. K. & Sanfey, A. G. The neuroscience of social decision-making. Annu. Rev. Psychol. 62, 23–48 (2011).
Patrick, C. J., Fowles, D. C. & Krueger, R. F. Triarchic conceptualization of psychopathy: Developmental origins of disinhibition, boldness, and meanness. Dev. Psychopathol. 21, 913–938 (2009).
Chase, H. W., Kumar, P., Eickhoff, S. B. & Dombrovski, A. Y. Reinforcement learning models and their neural correlates: an activation likelihood estimation meta-analysis. Cogn. Affect. Behav. Neurosci. 15, 435–459 (2015).
Rutledge, R. B., Dean, M., Caplin, A. & Glimcher, P. W. Testing the reward prediction error hypothesis with an axiomatic model. J. Neurosci. 30, 13525–13536 (2010).
Joiner, J., Piva, M., Turrin, C. & Chang, S. W. C. Social learning through prediction error in the brain. Sci. Learn. 2, 8 (2017).
Lockwood, P. L. & Wittmann, M. K. Ventral anterior cingulate cortex and social decision-making. Neurosci. Biobehav. Rev. 92, 187–191 (2018).
Yacubian, J. et al. Dissociable systems for gain- and loss-related value predictions and errors of prediction in the human brain. J. Neurosci. 26, 9530–9537 (2006).
Will, G.-J., Rutledge, R. B., Moutoussis, M. & Dolan, R. J. Neural and computational processes underlying dynamic changes in self-esteem. Elife 6, 1–21 (2017).
Nicolle, A. et al. An agent independent axis for executed and modeled choice in medial prefrontal cortex. Neuron 75, 1114–1121 (2012).
Schulreich, S., Gerhardt, H., Meshi, D. & Heekeren, H. R. Fear-induced increases in loss aversion are linked to increased neural negative-value coding. Soc. Cogn. Affect. Neurosci. 15, 661–670 (2020).
Engelmann, J. B., Berns, G. S. & Dunlop, B. W. Hyper-responsivity to losses in the anterior insula during economic choice scales with depression severity. Psychol. Med. 47, 2879–2891 (2017).
Engelmann, J. B., Meyer, F., Fehr, E. & Ruff, C. C. Anticipatory anxiety disrupts neural valuation during risky choice. J. Neurosci. 35, 3085–3099 (2015).
Blair, R. J. Neurocognitive models of aggression, the antisocial personality disorders, and psychopathy. J. Neurol. Neurosurg. Psychiatry 71, 727–731 (2001).
Blair, R. J. The motivation of aggression: a cognitive neuroscience approach and neurochemical speculations. Motiv. Sci. 8, 106–120 (2022).
Lockwood, P. L. et al. Neural mechanisms for learning self and other ownership. Nat. Commun. 9, 4747 (2018).
Diaconescu, A. O. et al. Hierarchical prediction errors in midbrain and septum during social learning. Soc. Cogn. Affect. Neurosci. 12, 618–634 (2017).
Iglesias, S. et al. Hierarchical prediction errors in midbrain and basal forebrain during sensory learning. Neuron 80, 519–530 (2013).
Garrison, J., Erdeniz, B. & Done, J. Prediction error in reinforcement learning: A meta-analysis of neuroimaging studies. Neurosci. Biobehav. Rev. 37, 1297–1310 (2013).
Cohen, J. D. et al. Computational approaches to fMRI analysis. Nat. Neurosci. 20, 304–313 (2017).
Alexander, W. H. & Brown, J. W. The role of the anterior cingulate cortex in prediction error and signaling surprise. Top. Cogn. Sci. 11, 119–135 (2019).
Roesch, M. R., Calu, D. J., Esber, G. R. & Schoenbaum, G. All that glitters dissociating attention and outcome expectancy from prediction errors signals. J. Neurophysiol. 104, 587–595 (2010).
Li, J., Schiller, D., Schoenbaum, G., Phelps, E. A. & Daw, N. D. Differential roles of human striatum and amygdala in associative learning. Nat. Neurosci. 14, 1250–1252 (2011).
Martins, D., Lockwood, P., Cutler, J., Moran, R. & Paloyelis, Y. Oxytocin modulates neurocomputational mechanisms underlying prosocial reinforcement learning. Prog. Neurobiol. 213, 102253 (2022).
Fehr, E. & Schmidt, K. M. A theory of fairness, competition, and cooperation. Q. J. Econ. 114, 817–868 (1999).
Li, J. & Daw, N. D. Signals in human striatum are appropriate for policy update rather than value prediction. J. Neurosci. 31, 5504–5511 (2011).
Wittmann, M. K. et al. Global reward state affects learning and activity in raphe nucleus and anterior insula in monkeys. Nat. Commun. 11, 1–17 (2020).
Huys, Q. J. M. et al. Disentangling the roles of approach, activation and valence in instrumental and pavlovian responding. PLoS Comput. Biol. 7, e1002028 (2011).
MacKay, D. J. C. Information Theory, Inference and Learning Algorithms. (Cambridge University Press, 2003).
Wilson, R. C. & Collins, A. G. Ten simple rules for the computational modeling of behavioral data. Elife 8, e49547 (2019).
Lockwood, P. L. & Klein-Flügge, M. C. Computational modelling of social cognition and behaviour—a reinforcement learning primer. Soc. Cogn. Affect. Neurosci. 16, 761–771 (2020).
Palminteri, S., Wyart, V. & Koechlin, E. The importance of falsification in computational cognitive modeling. Trends Cogn. Sci. 21, 425–433 (2017).
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Palminteri, S., Khamassi, M., Joffily, M. & Coricelli, G. Contextual modulation of value signals in reward and punishment learning. Nat. Commun. 6, 1–14 (2015).
Peirce, J. et al. PsychoPy2: experiments in behavior made easy. Behav. Res. Methods 51, 195–203 (2019).
Reniers, R. L. E. P., Corcoran, R., Drake, R., Shryane, N. M. & Völlm, B. A. The QCAE: a questionnaire of cognitive and affective empathy. J. Pers. Assess. 93, 84–95 (2011).
Murphy, R. O., Ackermann, K. A. & Handgraaf, M. J. J. Measuring social value orientation. Judgm. Decis. Mak. 6, 771–781 (2011).
Drislane, L. E., Patrick, C. J. & Arsal, G. Clarifying the content coverage of differing psychopathy inventories through reference to the Triarchic Psychopathy Measure. Psychol. Assess. 26, 350–362 (2014).
van Dongen, J. D. M., Drislane, L. E., Nijman, H., Soe-Agnie, S. E. & van Marle, H. J. C. Further evidence for reliability and validity of the triarchic psychopathy measure in a forensic sample and a community sample. J. Psychopathol. Behav. Assess. 39, 58–66 (2017).
Deichmann, R., Gottfried, J. A., Hutton, C. & Turner, R. Optimized EPI for fMRI studies of the orbitofrontal cortex. Neuroimage 19, 430–441 (2003).
Weiskopf, N., Hutton, C., Josephs, O., Turner, R. & Deichmann, R. Optimized EPI for fMRI studies of the orbitofrontal cortex: compensation of susceptibility-induced gradients in the readout direction. Magn. Reson. Mater. Phys. Biol. Med. 20, 39–49 (2007).
Esteban, O. et al. FMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16, 111–116 (2018).
Penny, W. D. et al. Comparing families of dynamic causal models. PLoS Comput. Biol. 6, e1000709 (2010).
Daunizeau, J., Adam, V. & Rigoux, L. VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data. PLoS Comput. Biol. 10, e1003441 (2014).
Cox, R. W. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput. Biomed. Res. 29, 162–173 (1996).
Spunt, B. bspmview. https://doi.org/10.5281/ZENODO.168074 (2016).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Waskom, M. L. Seaborn: statistical data visualization Statement of need. 6, 1–4 (2021).
Abraham, A. et al. Machine learning for neuroimaging with scikit-learn. Front. Neuroinform. 8, 1–10 (2014).
Makris, N. et al. Decreased volume of left and total anterior insular lobule in schizophrenia. Schizophr. Res. 83, 155–171 (2006).
Desikan, R. S. et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31, 968–980 (2006).
Goldstein, J. M. et al. Hypothalamic abnormalities in schizophrenia: sex effects and genetic vulnerability. Biol. Psychiatry 61, 935–945 (2007).
Frazier, J. A. et al. Structural brain magnetic resonance imaging of limbic and thalamic volumes in pediatric bipolar disorder. Am. J. Psychiatry 162, 1256–1265 (2005).
Tzourio-Mazoyer, N. et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15, 273–289 (2002).
Lockwood, P. L. et al. Association of callous traits with reduced neural response to others’ pain in children with conduct problems. Curr. Biol. 23, 901–905 (2013).
O’Connell, K. et al. Increased similarity of neural responses to experienced and empathic distress in costly altruism. Sci. Rep. 9, 1–11 (2019).
Palomero-Gallagher, N. et al. Functional organization of human subgenual cortical areas: relationship between architectonical segregation and connectional heterogeneity. Neuroimage 115, 177–190 (2015).
Palomero-Gallagher, N. & Zilles, K. Cortical layers: cyto-, myelo-, receptor- and synaptic architecture in human cortical areas. Neuroimage 197, 716–741 (2019).
Eickhoff, S. B. et al. A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data. Neuroimage 24, 1325–1335 (2005).
Rhoads, S. A. pyEM: expectation maximization with MAP estimation in Python. Zenodo. https://doi.org/10.5281/zenodo.10415396 (2023).
Acknowledgements
This work was funded by a National Science Foundation Graduate Research Fellowship Program Award (#1937959) to S.A.R., the Mistletoe Unfettered Research Grant to S.A.R., and a National Science Foundation award (#2139925) to A.A.M. We are grateful to Kinney Van Hecke, Ashley S. VanMeter, and the Georgetown Center for Functional and Molecular Imaging staff for their help with data collection. We also thank Joanna Li, Heather Doherty, Olivia Young, and Joy Chung for their help with participant recruitment, Marla Dressel, John Chin, Cait Jenkyn, Kate Bohigian, Joy Chung, Graeme Morland-Tellez, and Grace Hardymon for their assistance piloting the two-option version of the task, Montana Ploe and Paige Freeberg for their help with participant compensation, and to all of the participants who dedicated their time for this work.
Author information
Authors and Affiliations
Contributions
S.A.R.: Conceptualization, methodology, investigation, data curation, software, formal analysis, visualization, writing—original draft, writing—review and editing, funding acquisition, supervision, project administration; L.G.: Investigation, data curation, writing—review and editing; K.B.: Conceptualization, investigation; K.O.: Conceptualization; J.C.: Conceptualization, software, writing— review and editing; P.L.L.: Conceptualization, software, writing—review and editing; A.A.M.: Conceptualization, funding acquisition, resources, writing— original draft, writing—review and editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Rhoads, S.A., Gan, L., Berluti, K. et al. Neurocomputational basis of learning when choices simultaneously affect both oneself and others. Nat Commun 16, 9350 (2025). https://doi.org/10.1038/s41467-025-64424-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-64424-9




