Introduction

Mood-related disorders pose significant challenges to emotional regulation and psychological well-being and affect hundreds of millions of individuals worldwide. For many of these conditions, a pervasive negative affect is a hallmark1,2, often accompanied by impaired stress management and emotional instability3,4. Mounting evidence implicates dysfunction within interconnected networks of cortical and subcortical regions, including the prefrontal cortex, amygdala, and striatum, as a primary neurobiological basis of these symptoms3,5. Altered excitability in prefrontal-limbic circuits has also been observed in neuromodulation studies targeting emotion regulation networks6. Coactivation of the anterior cingulate cortex (ACC) and the orbitofrontal cortex (OFC) has been reported in human neuroimaging studies investigating emotion regulation7,8 and mood disorders9,10, suggesting that these regions have coordinating functions involved in these processes.

The ACC is thought to support multiple components of motivated behavior, including cost-benefit decision-making11,12, effortful control and adaptive learning13,14, and valuation under approach-avoidance conflict15. In particular, the pregenual ACC (pACC) has been implicated in negative affective biases that contribute to pessimistic decision-making16,17, and has been associated with clinical symptoms observed in depressive disorders18,19.

The OFC receives convergent inputs from gustatory, olfactory, visual, and somatosensory systems20. It has been reported to encode the value of expected outcomes21, to represent subjective value signals in a manner independent of specific sensory or motor contingencies22, and to contribute to learning predictive associations between cues and outcome, particularly in the context of reward and aversion23. Furthermore, it is reported to underpin adaptive behavior, enabling individuals to adjust their actions in response to changing circumstances24,25.

Pioneering functional studies treated the OFC as a relatively homogeneous region in terms of value representation22,26,27, but functional distinctions across OFC subregions have been identified, including medial versus lateral divisions28 and rostral versus caudal subregions29. Among the subregions of the OFC, the caudal OFC (cOFC) shares important anatomical connectivity with the pACC. Both regions are classified as dysgranular cortical areas, characterized by a relatively sparse granular layer compared to other OFC subregions30,31. Notably, these two subregions are anatomically interconnected32, form strong reciprocal connections with the amygdala33,34 and, motivating the study reported here, project preferentially to the striosome compartment of the striatum35,36.

Given these findings and others3,37,38, we here asked how the cOFC and pACC regions might differentially contribute to affective decision-making and be coupled together in regulating motivational processes in primates. We recorded from neurons in both the cOFC and the pACC in macaque monkeys performing a modified version of the approach-avoidance (Ap-Av) decision-making task. To test for causal contributions, we applied electrical microstimulation (EMS) to the cOFC, extending previous EMS work in the pACC and striatum39,40. The choice of the Ap-Av task rested on its original development to support pharmaceutical research and drug testing in rodents41, as subsequently adapted for both healthy and clinical human populations18,42, and modified for use in macaque monkeys to examine pACC and striatal functions39,40. It has further been implemented in two-choice paradigms to investigate decision-making mechanisms in rats and mice43,44,45. Here, we conducted multi-site neuronal recordings from both regions while simultaneously monitoring autonomic responses and reaction times (RTs) on a trial-by-trial basis.

Our findings demonstrate that neurons in the cOFC exhibit a balance of excitatory and inhibitory responses during the cue period when cost-benefit offers were presented. This activity pattern contrasted with that of simultaneously recorded pACC neurons, which showed a predominance of inhibitory responses. At the population level, the cOFC activity increased earlier than that of the pACC population, and EMS of the cOFC, using either high (150–200 μA) or low (5–15 μA) currents, induced increased avoidance behavior in the monkeys, suggesting that the cOFC could have a causal influence in inducing pessimistic bias in cost-benefit evaluation. We found close correlations between the behavioral patterns of the monkeys and the autonomic and RT parameters measured during the decision-making task, pointing to a potential form of brain-body synchronization characteristic of mood disorders.

Results

Two rhesus macaques, designated as subjects P and D, were trained over the course of approximately two years, during which time they made the transition from a task-naive state to a task-expert state in their use of directional saccades to one of two targets to report acceptance or rejection of a series of visually presented symbols representing compound offers of reward and aversion as described previously39. At the expert state, the monkeys received combined offers for reinforcement denoted by symbols presented on a computer screen in front of them (Fig. 1). These were composed of abutting horizontal bars whose ~200 independently variable lengths represented how much positive and negative reinforcement they would receive if they accepted the offer. Rewarding juice was an incentive, and aversive airpuff applied to the face was a deterrent.

Fig. 1: Experimental tasks.
Fig. 1: Experimental tasks.
Full size image

During the cue period of the Ap-Av task (red dotted rectangle), red and yellow horizontal bars signaled, respectively, the offered amounts of reward and aversion on the monitor. The monkeys made a decision between acceptance and rejection of the combined offer and reported this by choosing either of two targets (cross for acceptance/approach; square for rejection/avoidance) that appeared during the response period. The locations of the peripheral targets were alternated randomly. During the Pavlovian trials within the Ap-Av task, single cues were presented immediately after the fixation period, and the monkey experienced the respective positive or negative outcomes. During Ap-Av-Ap-Ap sessions (purple dotted rectangle), the Ap-Av task (upper diagram) was presented in the first and third blocks, and the Ap-Ap task (lower diagram) was presented in the second and fourth blocks. During the Ap-Ap cue period, the red and yellow horizontal bars on the monitor signaled two different offered amounts of reward. The monkeys made a decision between the two alternative options and reported this by choosing either of two targets (cross for choosing the red bar; square for choosing the yellow bar) that appeared during the response period. The locations of the targets were once again alternated randomly. During the Ap-Ap Pavlovian trials, single cues were presented right after the fixation period, and the monkey experienced the respective positive outcomes indicated by the length of either cue. During the Ap-Av EMS task (green dotted rectangles), the Ap-Av task was presented in the first and third blocks, and the Ap-Av EMS task was presented in the second block. During the Ap-Av EMS block, EMS was delivered simultaneously with the cue onset for 1 or 1.5 s of the cue period. The red and yellow horizontal bars, respectively signaling the offered amounts of reward and aversion, appeared on the monitor. The monkeys made a decision between acceptance and rejection of the combined offer and reported this by choosing either of two targets (cross for acceptance/approach; square for rejection/avoidance) that appeared during the response period. The locations of the targets were alternated randomly. During the Pavlovian trials, EMS was delivered as for the choice trials.

During daily (5 d/wk) sessions, combinations of the two bars were presented to monkeys as the visual cue for decision-making (see Methods). An array of approximately 40,000 combinations of reward and aversion, furnished randomly, were possible, in an attempt to mimic the complexity inherent in real-world decision-making processes. In addition to the choice trials, we inserted in 15% of the trials Pavlovian task trials, in which the animal experienced either positive or negative outcomes after single cues were presented. These took away the need for decision-making comparable to that in the main task. In most cases, if a given Pavlovian trial were a aversion trial, then the next one would be a reward trial, and vice versa. In Pavlovian trials, after the fixation period, a single cue was presented on the screen, and the monkey needed only to fixate on the cue for the first 300 ms for the cue to stay on for the full 1.5-s presentation, followed by reinforcement or aversion. The fixation was considered successful as long as the eye position stayed within the 3° cue radius for the required time. Fixation breaks were allowed after 300 ms because otherwise the monkeys would often exhibit fixation breaks to forfeit Pavlovian aversion trials. Due to the much smaller number of Pavlovian trials, and to investigate the parametric modulation of electrophysiological and physiological activity, we presented bars only of certain evenly spaced, preset sizes to ensure good coverage of the 200-interval distribution of reward and aversion presented during the choice trials (see Methods). After this, the outcome followed a randomized delay uniformly distributed between 600–800 ms.

Behavioral patterns during task performance

The animals were able to execute 1050–1500 trials per recording session, with sessions lasting 4–6 h. After the monkeys achieved mastery of the Ap-Av task, behavioral patterns like those depicted in Fig. 2a, b (top panel) emerged. For analysis, we included only sessions in which the decision boundaries had a positive slope (203 or 96.2% of all the sessions), indicating that the animals understood/performed well in the task, choosing an optimal way to solve the task by accepting offers that more often came with larger reward and smaller aversion (Fig. 2c). We excluded a small number of sessions (8 sessions or 3.8% of all the sessions) in which the decision boundaries had a negative slope. There was variability in the daily choices of the monkeys, partly because of the random combination of cues that were presented each day, and also due to the fact that on some occasions they rapidly decided to saccade to the unfavorable side of the screen. To minimize gaze-dependent distortion in measurements of pupil size, all trials in which eye position deviated from the central fixation window during the cue period were excluded from analysis.

Fig. 2: Physiological measurements across all experimental tasks.
Fig. 2: Physiological measurements across all experimental tasks.
Full size image

Session as an example of Ap-Av task performance with the original data (a) and a smoothed version of the same plot (b). The x-axis of these panels represents the percentage of reward (i.e., juice) offered, and the y-axis denotes the percentage of aversive outcome (i.e., airpuff) offered. Trials in which subjects chose to approach (accept) the offers are marked with green crosses. Instances in which subjects opted to avoid (decline) the offers are denoted by red squares. The black lines indicate decision boundaries, which were mathematically derived through a linear regression model using the MATLAB glmfit function (MathWorks, Natick, MA). The black line represents the 50% probability decision boundary, whereas the blue dashed line shows the 60% probability region, highlighting areas where approach behavior was more predominant. The orange dashed line marks the 40% probability region, underscoring areas where approach behavior was less predominant. Mean choice pattern (c) and difference in RTs (d, see Methods) during Ap-Av task averaged across all sessions, showing a significant correlation between RTs and observed choices (r = 0.55, p = 4.6 × 10⁻¹¹⁹, two-sided Pearson correlation). Average choice pattern (e) and difference in RTs (f) during the Ap-Av blocks in the Ap-Av-Ap-Ap task exhibit a positive and significant correlation (r = 0.18, p = 4.9 × 10⁻¹³, two-sided Pearson correlation), highlighting optimal behavior. Average choice pattern (g) and difference in RTs (h) during the Ap-Ap blocks in the Ap-Av-Ap-Ap task, with a significant positive correlation observed (r = 0.27, p = 4.5 × 10⁻²⁸, two-sided Pearson correlation). Average behavior (i) and difference in RTs (j) in the Ap-Av EMS task, where behavior was again significantly correlated with RTs (r = 0.21, p = 1.6 × 10⁻⁵, two-sided Pearson correlation). The same decision boundaries used in the behavior plots (left column) have been applied to the RT plots (right column) for illustrative purposes. Also see Supplementary Fig. 1a. No adjustments for multiple comparisons were applied. Source data are provided as a Source Data file.

As a control decision-making task, we introduced an approach-approach (Ap-Ap) task that was largely identical to the Ap-Av task with the difference that both the red and yellow bars signaled reward amounts (Fig. 1; bottom half of purple dotted rectangle). The rewards offered by the red bar were given when the monkey chose to saccade at the cross, whereas the rewards offered by the yellow bar were given when the monkey chose to saccade on the square. As with the Ap-Av task, Pavlovian trials were presented, but now corresponding to the new contingencies. This experiment was conducted in combination with the Ap-Av task so that, in each single session, two equal size blocks of the Ap-Av task interleaved with two equal size blocks of the Ap-Ap task (i.e., Ap-Av, Ap-Ap, Ap-Av, Ap-Ap), forming the Ap-Av-Ap-Ap task. Finally, the animals were given the Ap-Av EMS task, which was identical to the Ap-Av task with the only difference being that we separated the task into three blocks of equal trial number (Fig. 1; green-dotted rectangle). In the middle block, we microstimulated the cOFC during the full cue period (1–1.5 s) with either high (150–200 μΑ) or low (5–15 μA) currents, to determine whether and how varying levels of stimulation influence decision-making related to reward and aversion27,36,39,40.

We note several differences between our current protocols and previous studies with the Ap-Av and Ap-Ap tasks. EMS sessions were originally performed only under high (150–200 μA) currents36,39,40, and a joystick was used by the monkeys to move a cursor on the screen in front of them. In the version of the experiments developed here, we used, as noted, both high and low currents (150–200 μA and 5–15 μA), and the monkeys used saccadic eye movements to indicate their choice, avoiding the need to make a change in reference frames to indicate their responses16,36,40. Here, we also added estimates of the internal states of the animals during different task periods by recording lick rate, pupil diameter, RT and heart rate variability (HRV), and, as noted, added Pavlovian trials46 as further internal controls.

We analyzed the distribution of RTs relative to the cued offers, separately for approach (Ap) and avoidance (Av) choices (Supplementary Fig. 1a). RT was defined as the interval between the offset of the cue and the selection of one of the two peripheral stimuli (cross or square). The differences in RTs between Ap and Av trials (Fig. 2d) were significantly and positively correlated with the observed choices of the animals (Fig. 2c). During the Ap-Av blocks of the Ap-Av-Ap-Ap task, the animals exhibited near-optimal behavior, in which approach choices were positively and significantly correlated with the differences in RT (Fig. 2e, f). During the Ap-Ap blocks of the same session type, the animals also exhibited near-optimal behavior, choosing more often the larger of the two bars (Fig. 2g). Once again, the differences in RT were positively correlated with the decisions of the animals (Fig. 2h). In the cOFC EMS Ap-Av task, RT differences remained positively correlated with the animals’ Ap-Av choices, suggesting that aspects of deliberative processing could still be preserved under conditions of cOFC microstimulation (Fig. 2i, j).

Based on the monkeys’ overall choice patterns, we classified the cue combinations into two types: Ap-favoring cues, which typically resulted in approach choices (for example, high reward and low aversion), and Av-favoring cues, which generally led to avoidance choices (such as high aversion and low reward). We then compared RTs between these cue types for each actual choice direction. When animals selected Ap, RTs were significantly longer in trials with Ap-favoring cues than in those with Av-favoring cues (Supplementary Fig. 1a, left), despite the same behavioral outcome (Wilcoxon rank-sum test, p < 0.0001). Similarly, when animals selected Av, RTs were significantly longer in trials with Av-favoring cues than in those with Ap-favoring cues (Supplementary Fig. 1a, right; Wilcoxon rank-sum test, p < 0.0001).

These findings indicated that cue-congruent decisions, in which the choice matches the value suggested by the cue, are associated with longer RTs. By contrast, cue-incongruent decisions are executed more quickly, suggesting a more impulsive or reflexive response. This asymmetry in RTs suggests that animals invested more cognitive effort when their decisions were aligned with cue-based expectations of reward and aversion. The cue-incongruent decisions might have been careless mistakes made due to lack of attention to the task. To complement this analysis, we plotted RTs across all trial types as a function of reward and airpuff amounts (Supplementary Fig. 2). The smoothed heatmap revealed that RTs were shortest near the decision boundary, where subjective value is most uncertain, and became longer toward the extremes of the offer space, where options are clearly good or bad. This pattern suggests that decision latency is minimized when choice difficulty is highest, consistent with deliberation-related dynamics.

Neural activity during the Ap-Av task

We recorded 1712 well-isolated single units, 1278 units from the cOFC and 434 units from the pACC of monkeys P and D (Fig. 3 and Supplementary Fig. 3), with chronically implanted platinum iridium probes and with Plexon S-probe recordings, across all Ap-Av tasks. We calculated the percentage of cOFC and pACC neurons that exhibited statistically significant changes in spike counts during registered task events (chi-square test, p < 0.05, testing the change in spike counts 1 s before and 1 s after the onset of each event; Fig. 3a). The cOFC units showed a trend to be more active across most task events. This trend was particularly evident for the fixation point onset and Pavlovian red bar reward delivery onset. For the fixation point onset, there was a significant increase in cOFC activity (chi-square = 7.48, p = 0.006), with 54.4% of cOFC neurons showing significant activity compared to 37.4% in the pACC. Similarly, during the Pavlovian reward delivery onset (Red bar REW), there was also a notable rise in cOFC activity (chi-square = 6.67, p = 0.01), with 35.5% of cOFC neurons significantly active, in contrast to 23.2% in the pACC. However, for two events, we observed the opposite pattern during the airpuff-related events: approach airpuff onset (Ap air ON) and Pavlovian airpuff onset (Yellow air ON). Among the events analyzed, the approach airpuff onset event exhibited a statistically significant difference in event-related unit counts between the cOFC (lower) and pACC (higher) (chi-square = 11.37, p = 0.0007). The raw count of significantly active neurons was higher in the cOFC (670 neurons) than in the pACC (248 neurons), but this difference corresponded to 52.8% of the recorded neurons in the cOFC and 57.5% in the pACC. Additionally, during the Pavlovian yellow airpuff onset, there was a significant increase in pACC activity (chi-square = 5.00, p = 0.03), with 35.6% of pACC neurons showing significant activity, identical to the proportion observed during fixation point onset.

Fig. 3: Event-related neural activity during Ap-Av task.
Fig. 3: Event-related neural activity during Ap-Av task.
Full size image

a Proportions of cOFC and pACC units with event-related responses during the Ap-Av task. The events plotted include the fixation point onset for all trials (Fix point ON), choice/offer cue onset (Choice cue ON), airpuff delivery onset during Ap choice trials (Ap air ON), reward delivery onset during Ap choice trials (Ap rew ON), small reward delivery onset during Av choice trials (Av rew ON), highlight around the cross for choosing Ap during choice trials (Ap highlight), highlight around the square for choosing Av during choice trials (Av highlight), Pavlovian trials reward cue onset (Red bar ON), Pavlovian trials airpuff cue onset (Yellow bar ON), Pavlovian trials reward delivery (Red bar REW), and Pavlovian trials airpuff delivery (Yellow air ON). Event-related responses were assessed using a two-sided chi-square test comparing spike counts 1 s before and 1 s after each event (α = 0.05). Significant differences between cOFC and pACC included fixation point onset (chi-square = 7.48, p = 0.006), Pavlovian reward delivery (Red bar REW; chi-square = 6.67, p = 0.01), approach airpuff onset (Ap air ON; chi-square = 11.37, p = 0.0007), and Pavlovian airpuff delivery (Yellow air ON; chi-square = 5.00, p = 0.03). Sample cOFC regions (b shown in purple with white dotted outline) and sample pACC regions (c shown in olive with white dotted outline) from which the recordings have been performed. Also see Supplementary Fig. 5 for a more detailed population-level analysis of these event-related neural responses. Source data are provided as a Source Data file.

These findings demonstrate a significant trend for the cOFC units recorded to be more actively involved during events related to reward processing and the initiation of trials, possibly linked to attentional and reward-evaluation processes, whereas the pACC units recorded showed greater involvement during events associated with aversive outcomes, such as the delivery of airpuffs following particular choices or cues. This contrasting pattern suggests that the pACC units might have been more active in processing the consequences of aversive events than the cOFC units, contributing to learning and adaptation based on negative outcomes rather than responding directly to aversive stimuli.

We performed stepwise regression (Fig. 4) following the protocol of an earlier study39 to determine whether the mean firing rates during the 1.5-s cue period were correlated with task-related variables. These variables included externally observable factors such as the Reward (the amount/duration of offered reward indicated by the length of the red bar), Aversion (the amount/duration of offered airpuff indicated by the length of the yellow bar), and the binary Choice variable (Ap = 1, Av = 0). In addition, we considered interaction terms reflecting the value of the chosen option: Reward*Choice (chosen reward) and Airpuff*Choice (chosen aversive stimulus). Lastly, we incorporated two inferred internal constructs as previously introduced16: Eutility, representing the expected utility of the offer, and Conflict, reflecting cognitive tension during decision-making. In this analysis, if we found that the cOFC or the pACC had a key function in integrating costs and benefits, the activity of neuronal populations that we recorded would likely be represented by both offered and chosen values, albeit these populations making up only a small percentage of their neurons.

Fig. 4: Classification of units recorded in the cOFC and pACC.
Fig. 4: Classification of units recorded in the cOFC and pACC.
Full size image

Results of stepwise regression analysis for the Ap-Av task in the cOFC (a) and the pACC (b). Regression variables are reward size/duration (Reward), airpuff size/duration (Aversion), expected utility (Eutility), choice (Choice), chosen reward (Choice*Reward), chosen airpuff (Choice*Aversion), conflict in decisions (Conflict), and RTs. The histogram represents the number of units on the y-axis characterized by single (red-blue bars) or multiple (green bars) variables. These variables are then presented in the matrix below. The colored bars for the single variables demonstrate the positive (red) and negative (blue) correlations with the unit’s cue-related mean firing rates. Source data are provided as a Source Data file.

To further derive the subjective value of the chosen target, or the utility, we approximated the behavior using the conditional logit model47 and inferred utility inversely48. We then used expected utility49 as an explanatory variable corresponding to the continuous form of chosen value50. As in our original study39, we used the entropy from the conditional logit model as an indication of decision conflict, a term that we call Conflict for the purposes of analysis. Stepwise regression (see Methods) indicated that activity during the decision period of the Ap-Av task was captured by linear combinations of the eight factors listed above (F-test, p < 0.05; Fig. 4). For the cOFC, the most significant predictor was RT (86 units), followed by Reward (60 units) and Conflict (53 units), with the remaining single predictors and several combinations of predictors following. By contrast, in the pACC, the most significant predictor was Reward*Choice (25 units), followed by RT (23 units), and then Aversion, Choice, and Conflict (15 units each).

What stands out from this analysis is that the majority of the single predictors in the cOFC had a balanced positive/negative correlation with firing rates during the cue period, whereas in the pACC, the predictors mostly were negatively correlated with firing rates during the cue period.

Differential task-related firing patterns recorded in the pACC and cOFC

To compare task-related firing patterns between the pACC and cOFC, we examine the proportions of excitatory, inhibitory, and unresponsive units across key Ap-Av task events. These results demonstrated a consistent pattern: the cOFC units that we recorded exhibited stronger excitatory modulation, particularly during reward-related outcomes and choice cues, whereas the recorded pACC units included higher proportions of non-significant changes and, in some conditions, inhibitory responses, especially during aversive or less goal-directed contexts. This regional dissociation aligns with their proposed roles in outcome valuation (cOFC) and cost-related control or suppression (pACC).

To facilitate comparison of the functional response properties of the pACC and the cOFC, we summarized the proportions of excitatory, inhibitory, and unresponsive units across key Ap-Av and Pavlovian task events (Table 1). During most task events, a larger proportion of cOFC units than pACC units exhibited excitatory responses, whereas the pACC had a greater proportion of inhibitory responses. Around the cue onset event, firing rates were generally higher for the cOFC units (purple in Supplementary Fig. 4) than for the pACC units (green), with noticeable excitation after cue onset in the cOFC and slight inhibition during the same period in the pACC (Supplementary Fig. 4a). For most task events, this distribution pattern was maintained (Supplementary Fig. 4b).

Table 1 Proportions and counts of excitatory, inhibitory, and unresponsive pACC and cOFC units across key events in the Ap-Av task

We further compared the distributions of excitatory, inhibitory, and non-significant units using a combination of 2 × 3 and 2 × 2 chi-square tests. The 2 × 3 analysis revealed significant differences between the pACC and cOFC during the choice cue period (p = 0.006), Pavlovian reward period (p < 0.0001), approach reward period (p < 0.0001), and Pavlovian airpuff period (p < 0.0001). Follow-up 2 × 2 tests identified the directionality of these effects, showing, for example, that in the pACC the number of activated units increased from the Pavlovian reward cue period to the choice cue period, and in the cOFC excited units increased markedly from the Pavlovian reward cue period to the choice cue period (all p < 0.0001). Outcome-related comparisons also revealed reward-enhanced excitation and airpuff-driven inhibition in the pACC, and outcome valence sensitivity in the cOFC (all p < 0.0001).

In addition to unit counts, we compared firing rates across the key events and unit classifications, using Wilcoxon rank-sum tests on the smoothed unit data (0-1 s post-event). In the pACC, excitatory responses during the cue period were significantly higher for the Ap reward period than for the choice cue period (p = 0.022), and the Ap reward period relative to the Pavlovian reward cue period (p = 0.029), with markedly stronger firing during reward than airpuff outcomes (e.g., Ap reward period compared to Ap airpuff period: p < 0.0001; Pavlovian reward period compared to Pavlovian airpuff period: p = 0.001). Inhibitory neurons in the pACC had the largest effect sizes, with significant differences across almost all cue and outcome comparisons (e.g., Ap reward period compared to Ap airpuff period: p < 0.0001; Pavlovian airpuff cue period compared to Pavlovian reward period: p < 0.0001), indicating robust suppression especially in aversive contexts. Even the pACC units with non-significant changes showed systematic modulation between outcomes (e.g., Ap reward period compared to Pavlovian reward period: p = 0.021).

In the cOFC, excitatory responses were especially enhanced for reward outcomes, with significantly higher firing in  the Ap reward period compared to the Ap airpuff period (p < 0.0001), Pavlovian reward cue period (p < 0.0001), and Pavlovian airpuff period (p < 0.0001). Inhibitory units in cOFC also showed significant differences across reward and aversive comparisons (e.g., Pavlovian reward period compared to Pavlovian airpuff period: p < 0.0001), though to a lesser extent than in the pACC.

Direct cross-region comparisons confirmed stronger excitatory responses in the cOFC during Ap reward period (p < 0.0001) and Ap airpuff period (p = 0.0009), whereas the pACC showed significantly greater inhibitory responses during these same conditions (all p < 0.0001). For non-significantly activated units, a few differences were also detected (e.g., pACC compared to cOFC during the Pavlovian reward period: p < 0.0001), suggesting region-specific modulation even among weakly responsive neurons.

The combined unit count and firing rate analyses reveal a clear functional dissociation between the pACC and cOFC. Across both Pavlovian and choice conditions, cOFC neurons displayed stronger excitatory responses, particularly during reward-related events, whereas pACC neurons showed more pronounced inhibitory responses, especially in aversive contexts. These distinctions were consistently supported by statistical comparisons of response proportions and firing rates. This complementary pattern suggests that the cOFC primarily contributes to outcome valuation, whereas the pACC is more engaged in cost-related suppression and aversive processing within affective decision-making circuits.

Dynamics of responses in the cOFC and pACC

To examine the relative timing of excitatory population responses in the cOFC and the pACC, we developed a method to search for significant increases in firing rate in the normalized data shown in Fig. 5 during the task period from 0 to 2 s by requiring that there be three bins in a row with increased firing rates, where the bin size was 50 ms. To set an appropriate threshold of significance for an individual bin, we applied Bonferroni corrections for 38 comparisons (i.e., the number of sets of 3 successive bins that are possible in the 40 bins comprising the 0–2 s range) to the standard p < 0.05 significance threshold, and took the cube root of that number to account for the fact that the false alarm rate for three bins in a row is the cube of the false alarm rate for one bin (p = (0.05/38)^(1/3)). We then computed the threshold of significance at that p level from the inverse normal cumulative distribution function. The results are shown in several formats in Supplementary Fig. 5a. The first column shows histograms of the distributions of response times relative to the cue onset. The second column shows a pseudocolor plot of the normalized firing rates for units sorted by response time, with one unit on each row. The third column shows cumulative distribution functions of the response times across units. Note that each color raster plot (second column) contains a number of units at the top that did not have any responses by the 3-in-a-row criterion. They are included because they did have responses by the chi-squared criterion. However, these units are necessarily not included in the response time analyses because their response time could not be determined. Supplementary Fig. 5b shows the overlaid cumulative distribution functions for all four combinations of task type and recording region. The timing distributions for responses to the choice cues were significantly different between pACC and cOFC (Kolmogorov–Smirnov test, p = 0.030), as were those for the Pavlovian cues (Kolmogorov–Smirnov test, p = 0.032). After 0.2 s, the choice cue population response for the cOFC clearly has a faster rise during the Ap-Av decision-making task than does the pACC, and this cOFC lead also holds for the Pavlovian task in which the discriminative decision-making is not required. Interestingly, the curves for the cOFC-choice and cOFC-Pavlovian tasks overlap.

Fig. 5: Differential firing patterns of pACC and cOFC units during choice and Pavlovian trials in the Ap-Av task.
Fig. 5: Differential firing patterns of pACC and cOFC units during choice and Pavlovian trials in the Ap-Av task.
Full size image

Each panel displays a color raster plot of the firing rates of individual units aligned on event onset, with three main compartments: the top compartment consists of units that were unresponsive to each event, the middle compartment (between the two dotted lines) contains the excitatory units, and the bottom compartment contains the inhibitory units. The middle vertical line indicates the onset of each event. A 2-s period before and after the event onset is plotted. The y-axis of each panel represents the serial number of each unit. The top two rows show the results from the choice trials: the first row for pACC and the second row for cOFC. The bottom two rows present the results from the Pavlovian trials: the third row for pACC and the fourth row for cOFC. Statistical significance was tested using a two-sided chi-square test (p < 0.05) during the 1-s periods before and after the events. The heatmaps illustrate the variation in firing rates (in Hz), with warmer colors indicating higher firing rates and cooler colors indicating lower firing rates. Also see Supplementary Figs. 4 and 5. Source data are provided as a Source Data file.

To determine how units in the two cortical regions responded when only the red bar or the yellow bar was presented during Pavlovian trials, we analyzed the effects of different cue/outcome sizes. We had six different sizes/durations of reward/aversion (i.e., 30, 60, 90, 120, 150, 180), designed to cover the range of sizes/durations (5-200) of the red/yellow bars in choice trials. We focused on four primary events of interest: the reward cue, which relates to the red bar onset; the airpuff cue, associated with the yellow bar onset; the reward delivery onset, which follows the presentation of the red bar; and the airpuff delivery onset, associated with the yellow bar’s presentation. For each brain region and event of interest, neuronal firing rates were plotted across time to assess visually potential changes in activity related to specific events (Supplementary Fig. 6a). For each event and brain region, we extracted neuronal activity data and calculated the mean firing rates within the specified time windows (i.e., 1 s before and after the onset of each event; see Fig. 4b). To assess the significance of the observed trends, we performed linear regression on the bar values and calculated the t-statistic for the slope of the fitted line. The p-values for the slopes were then computed to determine the significance of the trends.

For the cOFC, we found significant positive effects for both the reward cue (t(4) = 14.21, p < 0.0001) and the airpuff cue (t(4) = 3.19, p = 0.03), indicating substantial changes in firing rates post-event. By contrast, the pACC trends for both the reward cue (t(4) = 0.2, p = 0.85) and the airpuff cue (t(4) = − 1.35, p = 0.25) did not reach significance, as determined by the linear regression analysis mentioned above. During the reward delivery event, the cOFC did not exhibit a significant trend (t(4) = − 0.29, p = 0.78), but the pACC exhibited a marginally significant effect (t(4) = − 2.55, p = 0.06). For the airpuff delivery event, the cOFC (t(4) = 6.33, p = 0.003) as well as the pACC (t(4) = 3.94, p = 0.0217) exhibited significant positive trends, again indicating substantial changes in firing rates post-event.

EMS applied to the cOFC alters approach-avoidance decision-making

We performed 87 cOFC EMS sessions combined for the two monkeys, and identified 28 sites at which EMS induced significant behavioral changes (15 sites total for monkey P and 13 sites total for monkey D; two-sided Fisher’s exact test, p < 0.05). We used 1-1.5 s long trains of biphasic pulses coincident with the choice cue period onset, with either high (150–200 μΑ) or low (5–15 μA) currents (Fig. 1; green dotted rectangles). Each site was stimulated during the middle block of the Ap-Av EMS task, surrounded by blocks without EMS. To perform EMS in a session, we required that choice behavior on the Ap-Av task be stable across at least two consecutive days. Fig. 6a, b illustrate the results from a single stimulation session applied in the cOFC in which non-significant change in the Ap-Av choices was observed. Of the 28 significant sites, 20 (71.4%) demonstrated a decrease in Ap and an increase in Av, 5 (17.9%) elicited decreases in both Ap and Av, 2 (7.1%) elicited increases in both Ap and Av, and 1 (3.6%) elicited increase in Ap and decrease in Av, during the EMS block. During the sessions in which we observed statistically significant EMS effects (Fig. 6c–f), the effects (increases in avoidance/approach choices) were generally more pronounced at the beginning of a block and gradually faded as the block progressed (Fig. 6g).

Fig. 6: Effects of EMS on decision-making.
Fig. 6: Effects of EMS on decision-making.
Full size image

a A sample EMS session, demonstrating stable behavior (non-significant increase/decrease of approach/avoidance behavior) in the Stim-off block and Stim-on block with high current EMS (150–200 μΑ). b Matrix plot of t-scores during high current EMS. c An EMS session in which a small but significant increase occurred in avoidance behavior during EMS with high currents. d Matrix plot of t-scores during high current EMS. An EMS session demonstrating significant effects between the blocks during low current EMS (5–15 μΑ, e), and matrix plot of t-scores during low current EMS, between the Stim-off and Stim-on block (f). g Relative change of approach and avoidance behavior during sessions in which EMS had a statistically significant effect (two-sided Fisher’s exact test, p < 0.05; left panel, 29 sessions) and sessions that did not have a statistically significant effect (right panel, 64 sessions; see Methods). Each block of trials was divided into five equal segments, and the percentage of Ap and Av trials within each segment was calculated. For EMS-significant sessions, avoidance changes across segments were 75.9 ± 18.4%, 67.6 ± 26.7%, 39.8 ± 15.6%, 37.2 ± 11.9%, and 9.7 ± 4.7%, while approach changes were −22.8 ± 3.0%, −12.4 ± 4.7%, −1.8 ± 9.6%, −9.3 ± 7.8%, and −4.2 ± 5.1% (n = 28 per segment). For EMS-non-significant sessions, avoidance changes were 22.6 ± 6.2%, 13.0 ± 5.4%, 13.8 ± 4.0%, 5.9 ± 4.6%, and 4.6 ± 3.4%, and approach changes were −4.4 ± 2.4%, −1.7 ± 2.1%, −3.8 ± 1.9%, 4.1 ± 4.0%, and −0.8 ± 2.3% (n = 63 per segment). Between-group comparisons for each segment were performed using two-sided t-tests (avoidance p = 0.0008, 0.006, 0.03, 0.004, 0.39; approach p = 2.9 × 10⁻⁵, 0.02, 0.78, 0.10, 0.48). No adjustments for multiple comparisons were applied. Relative change of approach and avoidance behavior during sessions in which EMS had a statistically significant effect (two-sided Fisher’s exact test, p < 0.05, left panel, 29 sessions) and sessions that did not have a statistically significant effect (right panel, 64 sessions; see Methods). Each block of trials was divided into five equal segments, and the percentage of Ap and Av trials within each segment was calculated. h Decision boundaries from 10 consecutive testing sessions in which the monkeys exhibited stable behavior (left panel) and those in which the monkeys exhibited non-stable behavior (two-sided Fisher’s exact test, p < 0.05). i Coronal MRI slices at AP + 32 mm, +30 mm, and +28 mm, referenced to the center of the ear bar. Slice positions were determined based on the actual physical distance from this origin, using the native MRI coordinate system. Red asterisks indicate sites at which EMS induced a statistically significant behavioral effect (two-sided Fisher’s exact test, p < 0.05). See methods for details on the construction and analysis of decision matrices. Also see Supplementary Fig. 7. Source data are provided as a Source Data file.

To analyze this temporal pattern, we divided each block of trials into five equal segments and calculated the percentage of Ap and Av trials within each segment, excluding error trials. Relative change was determined for each segment by comparing the corresponding segment of the second block (Stim-on block), during which EMS was delivered, with the matching segment of the first block (Stim-off block), during which no EMS was applied. There remained a statistically significant increase in avoidance throughout almost the entirety of a block (80%) relative to the sessions without an EMS effect (Fig. 6g). There was a statistically significant decrease in approach trials in 60% of the block’s trials, again compared to sessions without a significant EMS effect.

In this analysis (Fig. 6a–f), two-sided Fisher’s exact test was used to evaluate whether there was a statistically significant difference in choice behavior between two blocks of trials. For the entire data set, consisting of Ap and Av choices, the choices were divided into those in Block 1 (Stim-off block) and those in Block 2 (Stim-on block). The difference in choice behavior between the two blocks was assessed, comparing the total counts of Ap and Av choices. Separate two-sided Fisher’s exact tests were then conducted for the Ap and Av choices to identify whether there were separate increases in these behaviors. For Ap choice, we compared the proportion of Ap choices between Block 1 and Block 2. A statistically significant increase in Ap choices in Block 2 compared to Block 1 would indicate an increase in Ap behavior; a significant decrease would indicate a reduction. Similar analyses were done for Av behavior to determine whether there was an increase or decrease in Av behavior in Block 2 compared to Block 1.

The findings suggest that the cOFC can modulate avoidance behavior by influencing decision-making processes involving cost-benefit evaluations. Specifically, the cOFC appears to encode a negative bias in valuation, enhancing the weighting of aversive outcomes over that of rewards. The increase in avoidance behavior induced by EMS of the cOFC indicates that this region contributes to the assessment of potential risks, promoting actions that favor avoidance when potential aversive outcomes are present. This aligns with the proposed function of the cOFC in integrating aversive and reward-related information to guide adaptive behavior in situations involving Ap-Av conflicts.

We further performed a basic event-related analysis, as shown in Supplementary Fig. 7a. The results from the EMS blocks largely mirrored those from the standard Ap-Av sessions, showing a predominance of the cOFC across most task events, excepting reward delivery events in both Ap and Av choices, when the cOFC appeared more active than the pACC. This pattern contrasts with what we found for the standard sessions, in which the airpuff delivery was the event for which the pACC appeared more active. Notably, two of the three reward delivery events, task avoidance reward onset (Av rew ON) and Pavlovian reward delivery (Red bar REW), along with the highlight around cross for choosing Ap (Ap highlight), were the only events for which a statistically significant difference in spike counts in favor of the cOFC over the pACC was found (Av rew ON: chi-square test = 6.54, p = 0.01; Red bar REW: chi-square test = 14.22, p = 0.0002; Ap highlight: chi-square test = 5.76, p = 0.02). An increase was found in the percentage of neurons showing statistically significant activity during the microstimulated cue events (Choice cue Onset, Red bar Onset, Yellow bar Onset), but this result could have been confounded by EMS artifacts.

In summary, following the cOFC EMS, a higher proportion of cOFC neurons exhibited significant activity in response to reward, suggesting a potentially greater involvement in or sensitivity to positive outcomes than exhibited by the pACC samples. For Pavlovian trials during the cOFC EMS (Supplementary Fig. 7b), there was a significant negative parametric modulation of the firing rates proportional to the size of the offered cue/outcome (t(4) = − 4.48, p = 0.01). This finding suggests that cOFC EMS may alter how the brain processes reward-related information, potentially reducing sensitivity to larger rewards. Such a shift could account for the increased avoidance behavior observed following both low- and high-current EMS.

Autonomic and somatic responses during Ap-Av and Pavlovian task performance

In addition to the electrophysiological data, we made a series of other measurements, including two autonomic (HRV and pupil diameter) and two somatic (lick activity and RTs) factors during task performance (Fig. 7). We confirmed that pupil diameter was affected by task factors in addition to being affected by cue luminance (see Methods and Supplementary Fig. 8). All of the physiological metrics were significantly correlated with the simultaneously recorded behaviors, and they were correlated with each other. For all four measurements, we calculated the mean activity separately for Ap and Av trials, including the period from the choice cue onset to just before the outcome period onset, approximately 3.7 s, and then we took the difference between Ap and Av for each metric. The matrices displayed around the Behavior matrix in Fig. 7 correspond to differences in RT, HRV, pupil diameter, and lick activity, plotted as the difference between Ap and Av choices.

Fig. 7: Correlations between physiological metrics and behavioral responses during Ap-Av task.
Fig. 7: Correlations between physiological metrics and behavioral responses during Ap-Av task.
Full size image

The metrics displayed include lick activity, RT, HRV, and pupil diameter, averaged from cue onset to outcome onset. Each subplot shows these metrics against reward amount (%) and airpuff amount (%). Significant correlations are highlighted with arrows and correlation values. All correlations were calculated using two-sided Pearson tests, and the exact statistics for the relationships shown in the diagram were: RT-Decision (r = 0.55, p = 3.8 × 10⁻¹¹⁹); HRV-RT (r = 0.30, p = 9.9 × 10⁻³³); HRV-Decision (r = 0.33, p = 1.2 × 10⁻⁴⁰); HRV-Pupil (r = 0.05, p = 0.042); Lick-RT (r = −0.29, p = 7.5 × 10⁻³⁰); Lick-Decision (r = −0.42, p = 9.0 × 10⁻⁶⁵); Lick-HRV (r = −0.19, p = 1.0 × 10⁻¹³); Lick-Pupil (r = −0.08, p = 0.003); Pupil-RT (r = 0.09, p = 0.0007); and Pupil-Decision (r = 0.18, p = 3.6 × 10⁻⁸). The central panel consolidates these findings, depicting behavior as a function of reward amount (%) and airpuff amount (%), and emphasizing the collective impact of these physiological measures on the decision-making process in the Ap-Av task. The decision boundary in the physiological data plots is identical to the decision boundary during the respective behavior, for illustration purposes. No adjustments for multiple comparisons were applied. Also see Supplementary Figs. 1, 2, and 810. Source data are provided as a Source Data file.

For the Pavlovian trials, which had a shorter duration ( ~ 3 s) due to the absence of the decision/choice component (peripheral targets), we also calculated the average activity for the metrics other than RTs from the single cue (red or yellow bar) presentation until right before the outcome period. We observed distinct patterns of variation with outcome size/duration in lick activity, HRV, and pupil diameter across reward and airpuff trials (Supplementary Fig. 9). Lick activity was, as expected, higher in reward trials than in airpuff trials and decreased at the smallest reward size. HRV followed a U-shaped pattern during both reward trials and airpuff trials. This pattern suggests reduced autonomic variability during intermediate outcomes and enhanced engagement when outcomes were either highly rewarding or highly aversive.

Changes in pupil diameter were more pronounced during airpuff trials, with a notable decrease in diameter as airpuff sizes increased, whereas pupil dilation varied little with reward size during the period when reward anticipation could occur. For lick activity only, we included 5 s before the cue period (which included the 1-s fixation period and most of the inter-trial intervals) and the outcome period until the offset of the reward delivery event for Ap and Av trials as well as for Pavlovian trials, as we wanted to learn how the lick activity changed throughout the course of a full trial (Supplementary Fig. 10). In Choice trials, lick activity peaked at reward delivery, with small variations at earlier trial events. The same pattern was visible in Airpuff Pavlovian trials, but in Reward Pavlovian trials licking peaked most strongly toward the end of the fixation period.

We computed Pearson’s correlation coefficients between every pair of matrices in Fig. 7. HRV during choice trials was significantly correlated positively with observed behavior (r = 0.33, p < 0.0001), RTs (r = 0.30, p < 0.0001) and pupil diameter (r = 0.05, p = 0.04), and was negatively correlated with lick activity (r = −0.19, p < 0.0001). Lick activity was negatively correlated with observed behavior (r = −0.42, p < 0.0001), pupil diameter (r = −0.08, p = 0.003), and RTs (r = −0.29, p < 0.0001). Pupil diameter (r = 0.18, p < 0.0001) and RT (r = 0.55, p < 0.0001) were both significantly correlated with observed behavior. Analyses of the lick activity during the entire trial duration for both choice trials exhibited an equal increase in licking activity for both trial types (Ap or Av) before the monkeys received information about the offer (Supplementary Fig. 10a). Once the offer was presented, a slightly elevated activity occurred for the trials ending in approach (green color), continuing up until the outcome periods. During these outcome periods, elevated licking activity occurred at different time points corresponding to the reward delivery times. During the Pavlovian trials (Supplementary Fig. 10b), there was a sharp elevation in licking activity during the fixation point for trials in which the monkey would freely receive a reward (green color). This pattern suggests that the monkeys anticipated the reward in the Pavlovian reward trials, which were consistently presented 7-8 trials after the Pavlovian aversion trials.

Modeling EMS effects in reinforcement learning agents

To explore the dynamics of adaptive strategies under the influence of EMS, we employed linear advantage actor-critic (A2C) models, a subclass within the broader spectrum of reinforcement learning algorithms51. Our model aimed to capture the general behavioral patterns observed across both monkeys and all sessions in the Ap-Av task, rather than fitting specific individual sessions.

In the Ap-Av task, the monkeys were considered to use both potential rewards (x1) and risks (x2) in making their decision, as described in previous studies16,52. Here, in the decision metric d is calculated as:

$$d=a1(x1+b1)+a2(x2+b2)$$

where a1 and a2 represent the weights the agent assigns to each observation, signifying their relative importance, and b1 and b2 are biases that adjust the baseline values of the observations. The decision to approach or to avoid is not made directly from d, but rather through converting d into a probability via the logistic function47:

$${p}_{{approach}}=1/(1+{e}^{\wedge}(-d))$$

This probabilistic approach allows for optimal decision-making, encapsulating the agent’s tendency toward either approaching or avoiding based on its calculated confidence level. Our experimental findings for the Ap-Av task indicated that EMS predominantly influenced the monkeys by skewing decisions toward avoidance. To simulate the effect of the EMS in our model, we introduced a parameter γ and a baseline decision bias term β, modifying the decision metric to be:

$$d=a1(x1+b1)+\gamma a2(x2+b2)+\beta$$

The γ parameter allows us to modulate the effect of the aversive stimulus (x2) on the decision-making process. When γ increases, it amplifies the weight of the aversive stimulus, simulating the effect of EMS in increasing avoidance behavior.

We applied this model to behavioral data from 9542 trials from monkey D across 13 sessions and 9757 trials from monkey P across 15 sessions, in which statistically significant behavioral change was induced. Each trial represented the monkey’s approach/avoidance decision given specific reward and aversion offer values. The model was fit separately for baseline and EMS conditions using maximum likelihood estimation. We computed point estimates for all parameters, 95% confidence intervals via bootstrapping (1000 resamples), and statistical comparisons using paired t-tests. For monkey D, γ increased from 0.10 (95% CI: 0.10 to 0.21) in baseline to 0.19 (95% CI: 0.11 to 0.31) under EMS, an 83.7% increase (t = −43.25, p < 0.0001). The β parameter also became more negative, shifting from −0.53 (95% CI: −0.79 to −0.32) to −0.94 (95% CI: −1.16 to −0.74; t = 81.09, p < 0.0001), indicating a stronger baseline avoidance bias. For monkey P, γ increased from 0.43 (95% CI: 0.34 to 0.56) to 0.68 (95% CI: 0.56 to 0.80), a 58.1% increase (t = −104.42, p < 0.00001). The β parameter changed slightly from −1.09 (95% CI: −1.29 to −0.90) to −1.10 (95% CI: −1.33 to −0.90), with a statistically significant difference (t = 2.20, p = 0.03). These modeling results demonstrate that EMS increases aversion sensitivity (γ) in both monkeys, and also enhances baseline avoidance tendencies (β).

We illustrate in Fig. 8 how cOFC EMS modulates cost-benefit decision-making in monkeys P and D using an A2C reinforcement learning framework. Figure 8a, b and Fig. 8c and d show aggregated behavioral data across all significant sessions from, respectively, monkeys P and D. Red and green points denote Av and Ap choices. Logistic regression decision boundaries are shown for baseline (solid black line) and EMS (dashed black line) conditions. In both monkeys, EMS shifts the decision boundary, indicating an altered weighting of aversion versus reward. To further explore the effect of EMS intensity on decision-making, we simulated the A2C model under systematically increasing aversion sensitivity values (γ = 1.10 to 1.50), holding β constant. Figure 8e demonstrates how higher γ values lead to steeper decision boundaries and stronger avoidance behavior, even when external stimuli remain constant. These patterns reflect a model-based interpretation of how EMS might increase the brain’s perceived salience of aversion, thereby shifting the behavioral strategy toward avoidance. Figure 8f shows mean γ values across significant sessions for both monkeys during baseline and EMS blocks, highlighting a consistent trend toward increased aversion sensitivity under EMS.

Fig. 8: Modeling stimulation-induced shifts in cost-benefit decision-making.
Fig. 8: Modeling stimulation-induced shifts in cost-benefit decision-making.
Full size image

Aggregated behavioral data from monkey P (a, b) and monkey D (c, d) across significant sessions, modeled using an A2C reinforcement learning framework. Red and green dots represent Av and Ap decisions, respectively. Decision boundaries were estimated via logistic regression under baseline (Stim-off; solid black line) and Stim-on (dashed black line) conditions. In both monkeys, EMS shifted the decision boundary, reflecting an altered weighting of aversion and reward. e A2C model simulations showing how increasing γ (aversion sensitivity) values alter decision boundaries under fixed β values, for each monkey. f A2C model parameters: Mean γ (aversion sensitivity) during Stim-off and Stim-on blocks, separately for each monkey across significant sessions ( ± SEM; monkey P: n = 15 sessions; monkey D: n = 13 sessions). g Econometric model: Boxplots show Δγ across significant and non-significant sessions. The center line represents the median, the bounds of the box denote the 25th and 75th percentiles, and the whiskers extend to the most extreme data points within 1.5× the interquartile range. A more negative Δγ indicates enhanced sensitivity to aversion under EMS. Source data are provided as a Source Data file.

Econometric model

As an independent validation of the findings obtained with the modeling procedures applied above, we ran a separate econometric analysis using logistic regression (Fig. 8g). Specifically, we fitted a logistic regression model without an intercept by using the MATLAB function glmfit (logit link, constant off). For each session, we estimated separate regression coefficients for reward and aversion under baseline and stimulation conditions, denoted as b(1) and b(2), respectively. We then computed a sensitivity ratio, γ, defined as the aversion coefficient divided by the reward coefficient (γ = b(2)/b(1)). Given that b(1) is typically positive and b(2) is negative, γ is usually negative. A more negative γ reflects increased aversion sensitivity (greater weighting of cost relative to benefit), and a γ closer to zero indicates reduced aversion sensitivity. We denoted the change in γ (Δγ = γstimulation − γbaseline), where stimulation represents the EMS block and baseline the pre-stimulation block. This analysis demonstrated that in the model, stimulation modulated aversion sensitivity in a subset of sessions, with a general trend toward stronger avoidance behavior (i.e., more negative Δγ) in significant cases.

A Wilcoxon rank-sum test (Mann-Whitney U test) confirmed that Δγ differed significantly between effective and non-effective sites (p < 0.0001, z = −4.10), justifying the use of non-parametric statistics due to non-normal distributions. Furthermore, separate Wilcoxon signed-rank tests demonstrated that Δγ differed significantly from zero in both groups (p = 0.0005 for significant sessions; p = 0.02 for non-significant sessions), indicating that stimulation influenced aversion sensitivity across the board, most robustly in significant cases. This pattern supports the idea that cOFC EMS systematically altered the cost-benefit computations in a subset of sessions. Thus, across different modeling schemes, the models captured the directional effects of cOFC EMS on decision boundaries, highlighting how cOFC activity influences cost-benefit valuation and behavioral strategy during Ap-Av conflict.

Discussion

Our findings in macaque monkeys point to four main conclusions. We consider these in sequence below.

First, there are significant differences in the neural activity of well-isolated units recorded in the cOFC and pACC during the performance of both the Ap-Av conflict decision-making and Pavlovian trials. The recordings for the cOFC were the first to specifically focus on the most caudal part of the large OFC region, which partly overlaps with the striosome-projecting OFC. This work thus complements findings for more rostral and medial OFC regions analyzed in earlier studies22,23,27,28,29. Our findings for the pACC extend those of our previous studies38,39 (and references therein). The units that we encountered in this caudal OFC subdivision exhibited a balanced response pattern, with similar percentages of excitatory and inhibitory responses, particularly in response to airpuff and reward cues. The balanced mixture of excitatory and inhibitory responses in the cOFC subdivision suggests a heterogeneous and flexible coding scheme, which could support the integration of both reward and aversion signals during value-based decision-making. The balance of excitatory and inhibitory responses is also aligned with evidence that the cOFC can dynamically adjust its output to respond appropriately to varying motivational contexts, thereby maintaining flexible and adaptive states53,54.

By contrast, the units that we recorded in the pACC exhibited a predominance of inhibitory responses, mainly during the cue period. The functional implications of this inhibitory bias remain to be clarified, but contrast with the more balanced excitatory and inhibitory responses recorded in the cOFC. This contrast suggests the presence of region-specific coding strategies rather than a direct encoding of aversive value, favoring the possibility that there are region-specific cOFC-pACC network dynamics during affective decision-making. This interpretation is consistent with our unit count and firing rate analyses, which showed a predominance of inhibitory responses in the pACC and a more balanced profile in the cOFC, particularly during outcome-related periods. Even non-significant units exhibited systematic modulation, indicating that weakly responsive populations could contribute to region-specific encoding of motivational valence. The high level of unresponsive units found in these regions, as found previously for the pACC16, suggests that they likely are engaged in processes outside the immediate scope of our measurements, including network effects38 or that they have a more selective role in responding to certain types of stimuli55

Second, the patterns of response during the cue period across both choice and Pavlovian trials were stable in the cOFC and pACC, suggesting that these regions could share a function in inducing or reflecting this stabilization. The analyses during the Pavlovian trials helped to delineate the differential involvement of cOFC and pACC neurons in response to single rewarding and aversive cues (Supplementary Fig. 6b). In the cOFC, both cue events show a significant positive trend, indicating a substantial increase in firing rates post-event. The pACC population, however, exhibited a significant positive trend during both outcome events, suggesting increased neuronal activity in response to these events. Other events either showed non-significant trends or trends that were not strong enough to reach conventional levels of statistical significance.

These trends indicate that the cOFC is significantly activated by both positive and negative choice cue stimuli, indicating a putative role in anticipation of reward and aversion. The pACC population responded especially to outcome, both positive and negative, highlighting its putative involvement in processing potential rewards, threats or losses. Moreover, the cumulative distribution analysis indicated that in both the Ap-Av and Pavlovian tasks, the rise in activation of the cOFC population preceded in time the rise in activation of the pACC population activity recorded. Together, these findings suggest that different neural circuits, including here for the cOFC and pACC, can be involved in evaluating different aspects of cost-benefit decisions, giving a glimpse both in time and space of how the brain navigates complex decision-making challenges that entail both positive and negative outcomes.

These findings could yield insights into neural mechanisms underlying mood-related disorders such as major depressive disorder (MDD). The balanced excitatory responses to salient rewarding and aversive offers in the cOFC that we observed in the monkeys might reflect a neural mechanism that, if conserved in humans, could, when disrupted, contribute to MDD symptoms such as anhedonia or heightened sensitivity to either positive or negative stimuli2. The predominant inhibitory response in the pACC could indicate a mechanism for regulating values derived from the integration of positive and negative emotional responses and cognitive processes, potentially contributing to the dysregulation seen in MDD1. Extending our findings could thus have relevance for the clinic.

Thirdly, our findings demonstrated that EMS of the cOFC with both high (150–200 μA) and low (5–15 μA) currents effectively induced avoidance behavior in the monkeys, indicative of a pessimistic shift in decision-making. This effect was not only observed behaviorally but was also captured by our reinforcement learning model (Fig. 8). The model showed that EMS could be simulated by increasing the gamma (γ) parameter, which amplifies the weighting of aversive outcomes over rewards. This shift in γ resulted in a systematic increase in avoidance behavior, mirroring the behavioral changes observed in the monkeys. Therefore, both our experimental findings and the modeling suggest that cOFC microstimulation causally biases monkeys’ decisions toward avoidance. We could not determine whether this effect reflected an increased perception of cost, enhanced salience of aversive outcomes, disrupted cost-benefit integration, altered decision stochasticity, or other processes. Nevertheless, our results are consistent with experimental evidence that OFC microstimulation can modulate specific decision-making parameters, such as value sensitivity and choice variability, depending on the timing and context of stimulation27.

We note that to consider whether the behavioral effect of cOFC microstimulation might result from non-specific aversiveness, we drew on prior findings that comparable stimulation in the pACC and caudate nucleus did not elicit significant autonomic arousal16,40. We have found that subgenual ACC stimulation produces a mild increase in skin conductance38, but its magnitude was far smaller than that evoked by aversive airpuffs. These observations suggest that the shift toward avoidance in our study is unlikely to reflect direct aversiveness of the stimulation itself, but instead may reflect altered value or conflict processing.

Finally, our experiments revealed consistent correlations between neuronal activity and distinct autonomic and somatic responses recorded during Ap-Av conflict decision-making. The significant correlation between HRV and observed behavior is important in light of evidence that HRV is a reliable indicator of emotional and stress responses, which can influence decision-making56,57. The positive correlation between the difference in HRV and the difference in RTs further strengthens the notion that autonomic activity supports the performance of cognitively demanding tasks, in this case, planning and executing saccades to the correct target to implement the monkey’s Ap or Av decision58. Similarly, the correlation between pupil diameter and behavior highlights the importance of arousal and attentional mechanisms. Pupil diameter is a well-established marker of cognitive and emotional processing59,60. Increased pupil diameter is associated with greater cognitive load and decision uncertainty61. The negative correlations between the difference in lick activity and both observed behavior and the difference in RTs during choice trials suggest that licking behavior, potentially indicative of anxiety or arousal, inversely affects decision speed and accuracy. The monkeys licked more during Ap trials with high airpuff and small reward cues compared to their licks in Ap trials with large rewards and small airpuff cues, suggesting heightened anticipatory behavior or anxiety in these relatively aversive conditions (Supplementary Fig. 1b). This form of anticipatory anxiety has been observed in many species across the animal kingdom, including in monkeys, highlighting its role in preparing animals for expected stressors and their potential impact on behavioral responses62,63. The observed relationships among these physiological metrics further emphasize their interconnected nature. For instance, the significant correlation between HRV and pupil diameter suggests a link between autonomic regulation and attentional mechanisms, corroborating the idea of a shared underlying system influencing both64. Anticipatory anxiety can lead to increased stress responses and behavioral changes, highlighting the importance of these responses in understanding the pathology of mood disorders and the potential for targeted therapeutic interventions65,66. Thus, our findings suggest strong ties between neural activities in the cOFC and pACC and both autonomic and somatic components of behavior accompanying these activities. We propose that these findings could advance insight into the brain-body axes that characterize salient and motivationally challenging behavioral states.

The pACC and cOFC are reciprocally connected17 (and references therein) and send outputs to subcortical structures. Notably, the pACC, along with the subgenual ACC and the cOFC, is a limbic system-related cortical region that receives direct inputs from the amygdala34. Both the pACC and cOFC have preferential projections to the striosome compartment of the striatum, especially to the striosomes in the anterior caudate nucleus and the adjoining rostral part of the putamen35,36.

There are limitations of our study that should be mentioned. The findings are based on only small subsets of the many thousands of neurons in the cOFC and pACC, and we cannot assess the degree to which differential sampling biases between the cOFC and pACC recordings may have influenced the results. Identifying the neuronal types corresponding to those from which we recorded will help to clarify these issues. Our analyses demonstrated consistent covariation between cOFC activity, behavioral choices, and autonomic responses, including trial-by-trial associations with RT, pupil diameter, and heart rate (Fig. 7, S8–S10), but we realize that these relationships remain correlative. The autonomic measures were not directly modeled, but could reflect internal motivational states that were modulated by EMS. Clarifying the temporal structure and directionality of these associations will require future modeling work, which should in turn help to specify how such neuronal signals contribute to coordination of motivational, autonomic, and behavioral states during decision-making.

We were able by using cOFC EMS to suggest a causal relationship between cOFC activity and pessimistic decision-making, recalling the results of high-current EMS delivered to the pACC. Our findings further demonstrate that the neuronal responses during the Ap-Av decision-making, as represented by the activity of the neuronal populations that we recorded, have different kinetics, with the cOFC populations responding earlier during the decision period than the pACC populations recorded, potentially leading activity in the broader OFC-pACC-prefronto-striosomal network38. In sum, our evidence suggests that this cOFC activity is causally related to this early decision-making, that it can be biased by external microstimulation, and that it can lead and thus potentially drive activity of the cOFC-pACC network as a whole, which is synchronized with multiple autonomic and somatic activity patterns indicative of brain-body synchronization. Given these findings, studies employing pathway-specific interventions and circuit-level recordings should be poised to determine how cOFC-pACC-striosomal circuits contribute to affective decision-making in health and to the development of maladaptive behaviors characteristic of mood disorders.

Methods

Subjects

Two Macaca mulatta monkeys, one male (P, 15.4 kg) and one female (D, 9.1 kg), were studied in experiments that strictly adhered to the Guide for Care and Use of Laboratory Animals of the United States National Institutes of Health. All experimental procedures were approved by the Committee on Animal Care of the Massachusetts Institute of Technology. Prior to training, both monkeys underwent a habituation process that acclimated them to a seated position in a monkey restraining chair and ensured a proper facemask fit. All surgical procedures were performed under sterile conditions with deep anesthesia. Postoperative analgesics were administered to the monkeys. Prophylactic antibiotics were injected intramuscularly on the day of the surgery and were continued daily for the subsequent week.

Task procedures

The initial phase of training aimed to acquaint naive monkeys with the testing environment and basic task structure. Monkeys were first presented with a large central white fixation point occupying half of the screen. The goal was to have them fixate on this point, beginning with short durations, such as 50 ms. Over time and with consistent training, the size of this fixation point was gradually reduced, and the duration of required fixation was increased. This progression continued until monkeys could consistently maintain their gaze for up to 1 s. After establishing consistent fixation behavior, the monkeys were introduced to the red and yellow bars. At this introductory stage, both bars represented positive outcomes. Different sizes of these bars were displayed to match different sized outcomes to emphasize the significance of size in the upcoming tasks and to instill the understanding that size matters.

As the training progressed, specific associations were assigned. The red bar continued to be associated with a positive outcome and was now consistently linked with a white cross. The yellow bar, which had initially indicated a positive outcome, was now consistently linked with a white square (undergoing a transformation later in the Ap-Av task, when it began to signify a negative outcome). Training continued with the Ap-Ap task, in which both choices led to positive outcomes. During a typical choice trial, the sequence began with the monkey fixating on the central fixation point for up to 1 s. Then, the cue offer was presented, with two abutted red and yellow bars of random sizes, followed by peripheral symbols representing the outcomes linked to the red (cross) and yellow (square) bars, which appeared for 1 s. The monkey was required to fixate on one of these symbols for at least 200 ms, indicating its choice. A successful choice was confirmed with a yellow highlight circle around the selected symbol, whereas a failure to decide within the given time rendered the trial invalid.

Following a correct choice, an interval of 600–800 ms preceded the outcome, which corresponded to the chosen bar’s association. If fixation was broken at any point during the task, a penalty in the form of a 5-s delay was imposed before the initiation of the next trial. Upon mastering the Ap-Ap task, monkeys moved to the Ap-Av task, in which the choices were associated with contrasting outcomes: positive for the red bar and negative for the yellow bar. In the Ap-Av task, the concept changed; instead of choosing between the two bars, the monkeys had to decide whether to approach or to avoid the presented offers by making a saccade to the cross (to approach) or to the square (to avoid). After solidifying their understanding of the Ap-Av task, monkeys tackled the Ap-Av-Ap-Ap format. The Ap-Av (first and third block) and Ap-Ap (second and fourth block) tasks were given in an alternating sequence. Mixed into these primary tasks were the Pavlovian cue trials, which required no decision-making and constituted around 15% of all trials. The trials consisted of bars of certain sizes/dimensions: 30 (bar size: 5.5 × 8.5 mm), 60 (bar size: 11 × 8.5 mm), 90 (bar size: 16.5 × 8.5 mm), 120 (bar size: 22 × 8.5 mm), 150 (bar size: 27.5 × 8.5 mm), and 180 (bar size: 33 × 8.5 mm).

Recording setup

After the behavioral training phase, individualized recording chambers, informed by 3 T MRI coordinates, were implanted. These coordinates were derived from T1-weighted (0.5 mm isotropic) and T2-weighted (0.35 mm isotropic) MRI scans, which provided a detailed view of the skull surface. Gray Matter Research fabricated these chambers, which were implanted on both monkeys (P and D) so as to allow access to a significant part of the OFC and the anterior striatum. Several weeks following chamber installation, a craniotomy was performed to expose the dura mater. The chamber was sealed using silicone elastomer (Kwik-Sil) applied directly to the dura mater. To pinpoint the exact coordinates for each electrode track, an MRI scan was made with the chamber and grid67 filled with a solution of saline combined with a 5% gadolinium contrast agent. Approximately two weeks later, the electrode array implant surgery took place. Chambers were fitted with grids that featured an array of openings (40 × 30 mm), each with a diameter of 0.48 mm and a center-to-center distance of 1 mm. Probes were secured onto microdrives, so that they could be inserted by screw-controlled adjustments (158 µm per complete turn). Custom-made micromanipulators held all these electrodes in place on the grid. The MRI (T1- and T2-weighted images, 0.5 mm slice thickness) with the chamber and grid infused with saline and the 5% gadolinium contrast agent allowed the coordinates of each electrode track to be determined. Under anesthesia and in sterile conditions, platinum-iridium electrodes (with impedances ranging between 0.8–1.5 MΩ; FHC Inc., Bowdoin, ME) were implanted. For monkey P, 57 electrodes were implanted: 20 in the pACC and 37 in the cOFC. Monkey D was implanted with 48 electrodes: 15 in the pACC vicinity and 33 in the proximity of the cOFC.

Recording of physiological activity

To characterize the internal behavioral states of the monkeys during task performance, we measured licking, pupil diameter, pulse, and RT. Licking was quantified by summing the absolute values of mouthpiece acceleration in three dimensions, using data from a three-axis accelerometer (SparkFun, MMA8452Q) attached to the mouthpiece that delivered liquid reward. These signals were directly routed to the input of the electrophysiology system (SX Neuralynx) after being attenuated to match the system’s input range ( ± 5 mV). This allowed synchronous recording with neural electrical activity at the same sample rate. Further processing of the signals was performed using MATLAB 2018a, in which relevant features were extracted from the raw data. The three-dimensional licking signal was obtained by summing the absolute values of the signals from the X, Y, and Z dimensions. The combined signal was low-pass filtered at 10 Hz to remove high-frequency noise. The filtered signal was then downsampled from 32 kHz to 1 kHz to reduce data size and to facilitate further analysis. The processed licking signals were saved for subsequent analysis.

To generate the graphs shown in Supplementary Fig. 10, the licking activity for each trial was first interpolated to match the average trial length of each trial type in each session with the MATLAB interp1 function. This standardization was done to ensure consistency across the same type of trials within each session. At each time point, an average was calculated across all trials. The trial average time course was then interpolated for each session to match the average trial length across all sessions. The interpolated waveforms were then averaged across all sessions. Finally, error bars were calculated to represent variability as the standard error of the mean across sessions.

Pupil diameter was measured using an infrared eye-tracking system (SR Research, EyeLink 1000) designed for the visually guided task, with signals also routed to the electrophysiology system post-attenuation. There were two conditions under which a valid pupil diameter measurement was not possible. First, during blinks, or when the monkey was fully asleep with eyes closed, the pupil diameter signal dropped to an impossibly low value that we refer to here as blink level. Second, when the monkey was drowsy but not fully asleep, the eyes were typically half-closed and produced a pupil diameter signal that varied rapidly between the blink level and a potentially plausible diameter reading. Those two types of signals were marked as follows. First, we marked all individual samples whose values were at blink level. Then we computed the fraction of blink level samples in a sliding window 90 ms wide across the entire session. We identified blinky time periods in which every 90 ms window contained at least one blink level sample, and analyzed them further to determine whether they represented eyes half-closed or well-formed blinks. A well-formed blink was required to contain at least one sample where the 90 ms window contained 100% blink level values, and to be surrounded by at least 90 ms before and afterward where the 90 ms window contained 0% blink level values. The sample values in well-formed blinks were left unmodified. All other blinky time periods had their sample values replaced by a blink level value, making them appear similar to periods when the monkey’s eyes were fully closed. These cleaned data were low-pass filtered and downsampled by a factor of 32. The resulting file was read, and all blink level values were replaced by NaN (Not-a-Number), so that they would not be included in calculations of the average and median pupil diameter. One such value was computed for every trial using samples from cue onset to outcome onset, with values for choice and Pavlovian trials saved separately in a.mat file.

We did not attempt to match the luminance of the choice cues across different combinations of reward size and airpuff size, as the resulting changes in brightness could have confused the monkeys. To control for the effect of total luminance on pupil size, we compared pupil responses during Ap-Ap trials to those during Ap-Av trials from sessions that combined blocks of both trial types, as was done in the original Amemori study39. Supplementary Fig. 8 shows the residual variation in pupil size that cannot be accounted for on the basis of cue luminance, obtained by subtracting the pupil diameters measured in Ap-Ap trials from those measured in Ap-Av trials. This procedure can also be expected to remove the variance due to the general decision-making process per se, leaving only the variance due to decision-making under conflict. The two monkeys showed different patterns of pupil responses across the decision matrix plane, and, to verify that these patterns were not due to random statistical variation, we ran single-sample and paired-sample t-tests to verify that the differences between Ap-Av and Ap-Ap responses were significantly non-zero, and that the high points and low points were significantly different from each other, respectively. For monkey D (Supplementary Fig. 8, left panel), the bin numbered 19 for airpuff and 11 for reward size was non-zero at a significance level of p = 0.04, and that for airpuff 8, reward 5 was non-zero at p = 7.0 × 10−5; the two bins differed from each other at p = 0.0009. For monkey P (Supplementary Fig. 8, right panel), the bin at airpuff 1, reward 18 was non-zero at p = 0.003, that at airpuff 8, reward 1 was not significantly different from 0, and the two bins differed from each other at p = 0.0003.

Pulse measurements were obtained from an ear-clipped oximeter (SparkFun, SEN-11574), processed similarly to licking and pupil diameter signals, and downsampled to 1 kHz. HRV metrics were derived by detecting peaks in the z-score normalized oximeter signal, calculating intervals between peaks, and computing the standard deviation of these intervals (RRstd) within defined target-period task windows. Raw oximeter traces were recorded at 32 kHz, low-pass filtered with a zero-phase fourth-order Butterworth filter with a 100 Hz cut-off, and decimated by a factor of 32 to align with the 1 kHz behavioral streams. Pulse peaks were detected with the MATLAB findpeaks function, using a prominence threshold of three standard deviations and a minimum inter-peak interval of 250 ms. Instantaneous HR and RRstd were calculated from cue onset up to, but not including, outcome onset ( ≈ 3.7 s in Choice trials and ≈ 3.0 s in Pavlovian trials) and retained in their native units for all subsequent analysis.

RT was defined as the time needed between choice cue offset and the choice to saccade on one of the two peripheral targets, cross or square symbol, for, respectively, choosing Ap or Av. These were recorded from our electrophysiology system as well as from the behavioral task system (NIMH MonkeyLogic) controlling event sequences68.

We calculated the difference in average RTs between approach and avoidance choice trials across all session types. RTs were binned based on reward and airpuff levels, with each bin covering 5% increments for both reward and airpuff amounts. For each session, average RTs for approach and avoidance trials were separately computed by summing and counting within each bin. The difference was then taken between the Ap RT and the Av RT. To highlight patterns across sessions, these differences were averaged and plotted as heatmaps. We smoothed the RT difference data by applying a Gaussian filter using the MATLAB fspecial function to create the filter kernel, and then convolved it with the data using the imfilter function, which helped in reducing noise and improving the clarity of the heatmap visualization. A decision boundary derived from the econometric model (see below) was overlaid to indicate regions of behavioral transition between approach and avoidance decisions. The color scale represents the difference in RTs (approach − avoidance). The same procedures were applied to all physiological differential Ap-Av matrices.

The relationship between the physiological metrics was further investigated by calculating the Pearson correlation coefficients between pairs of difference variables: pupil size difference, RT difference, decision difference, HRV difference, and licking behavior difference. Prior to the correlation analysis, we performed data preprocessing to ensure the validity and reliability of the results. Specifically, we flattened any matrix-form data into one-dimensional vectors to standardize the data structure across all variables. We then removed any observations containing NaN values or zeros in any of the variables, as these could represent missing or non-informative data that might skew the correlation results. This ensured that all data vectors remained aligned and of the same length. The Pearson correlation coefficient was then calculated for each variable pair. Alongside the correlation coefficients, we computed the associated p-values to assess the statistical significance of the observed correlations.

Econometric modeling

To calculate the internal variables or parameters and to understand the decision-making processes of monkeys, we employed an econometric model to approximate the monkeys’ choice behavior. This model is fundamentally based on the assumption inherent in discrete choice models, where the choice alternatives must be mutually exclusive, choosing one option precludes selecting another, and exhaustive, meaning all possible choices are included and are finite in number. We used the conditional logit model, the most common and widely adopted model for analyzing discrete choice behavior.

Three axioms were essential for the application of this model16: (1) the subject is a utility-maximizing decision-maker; (2) utilities can be represented as the linear sum of a representative term and an error term, i.e., U+ + e+ for one choice and U+ e for the other; and (3) the error term, represented as ej (j = + or □), is independently and identically distributed. Within the context of the sampled monkeys’ decisions, we can infer these subjective values in reverse. If there are two options (denoted as + and □ targets), the probability of choosing + targets can be expressed as p+ =1/(1+exp(-(U+ - U))), where U+ and U represent the representative utilities of each option.

Stepwise regression

We performed multiple regression analyses to examine the patterns of neuronal activity during the cue period in the Ap-Av task across all session types. To identify an optimal set of variables that linearly parameterized neuronal responses, we used a stepwise regression approach through the MATLAB stepwisefit function (MathWorks, Natick, MA). This method iteratively adds or removes variables from a linear model based on their statistical significance, as determined by a series of F-tests. It starts with a preliminary model and assesses the impact of including or excluding each variable on the model’s explanatory power, using the p-value of the F-statistic as a measure for comparing models with or against each additional variable. The threshold for determining a variable’s significance was a p-value less than 0.05.

In our analysis of the Ap-Av task, the selected variables included: Rew (reward value indicated by a red bar), Ave (airpuff duration indicated by a yellow bar), Eutil (expected utility in the Ap-Av scenario), Cho (binary indicator for approach [1] or avoidance [0] choices), Cho*Rew (interaction between choice and reward), Cho*Ave (interaction between choice and airpuff duration), Conf (presence of decisional conflict), and RT. These variables were chosen for their potential to elucidate the underlying factors influencing neuronal activity patterns associated with decision-making in these tasks (Fig. 4).

Pavlovian trials

To assess changes in firing activity across different levels of reward and aversion in the cOFC and pACC, we computed the mean firing rates for 1 s before and after the cue and outcome events and then calculated the difference between these means (Supplementary Fig. 6b). To evaluate the significance of observed trends, linear regression was performed on these mean differences using the MATLAB fitlm function. Each regression involved fitting a first-degree polynomial (a line) to estimate the slope. The fitlm function provided the t-statistic for the slope, calculated as the slope divided by its standard error, which accounts for the variability of the fit. The significance of the slope was assessed directly through the two-tailed p-value provided by the fitlm output. This p-value was used to determine the statistical significance of the trend across different reward and aversion levels.

Electrical microstimulation

For stimulation experiments, monopolar stimulation was applied. The stimulation train consisted of 200-μs pulses delivered at 200 Hz. Each pulse was biphasic and balanced, with the cathodal pulse leading the anodal pulse. Task events were also sent to a separate PC running MATLAB (MathWorks) to control the EMS generated by the stimulator (Master-8, A.M.P.I.) and isolator (A365, WPI).

To quantify the effects of EMS, we compared the differences between decision matrices of the Stim-off block and the Stim-on block. Decision matrices were constructed by convolving the choice data with a 30-by-30-point square-smoothing window and aggregating each choice datum at each point in a 100-by-100 grid. We then calculated t-statistics to measure the difference in avoidance and approach frequencies between the Stim-off and Stim-on blocks. Positive t-statistics (Fig. 6b, d, f; shown in red) indicate regions where EMS resulted in a relatively larger increase in avoidance behavior compared to approach behavior. Negative t-statistics (Fig. 6b, d, f; shown in blue) indicate regions where EMS resulted in a relatively larger increase in approach behavior. The Fig. 6 plot highlights specific conditions in which EMS significantly influenced decision-making behavior, either by increasing avoidance, increasing approach, or having minimal effect.

For the analysis presented in Fig. 6g, each block of trials was divided into five equal segments (parts), and the percentage of Ap and Av trials within each segment was calculated, excluding error trials. Relative change was determined for each segment by comparing the corresponding segment of the second block (Stim-on block), during which EMS was delivered, with the matching segment of the first block (Stim-off block), during which no EMS was applied. These relative changes were aggregated across all sessions into two distinct groups: EMS-significant and EMS-non-significant sessions. For each segment, independent two-sample t-tests were conducted to compare the average relative changes in Ap and Av behaviors between the EMS-significant and EMS-non-significant groups. Statistical significance was determined at p < 0.05 (*), and p < 0.001 (**). This analysis enabled the assessment of whether EMS selectively modulated Ap and Av behaviors across different segments of the task by comparing the proportional changes between the two experimental conditions across all sessions. Significant and non-significant effects were distinguished using two-sided Fisher’s test (p < 0.05), to compare numbers of Ap and Av trials across blocks.

Computational model

We implemented an OpenAI Gym environment to simulate the Ap-Av task. This environment provides a two-dimensional observation space representing reward (x1) and risk (x2) values, both normalized between 0 and 1. The action space is discrete with two possible actions: approach (1) or avoid (0). At each step, a new random observation (x1, x2) is generated. If the agent chooses to approach, it receives a reward equal to the benefit minus the cost. If the agent chooses to avoid, it receives no reward. The episode does not terminate, allowing for continuous interaction.

Our A2C model consists of two main components: an Actor (Policy Network) and a Critic (Value Network). The Actor takes a 2-dimensional observation as input and outputs the probability of approaching. The Critic estimates the state value based on the same input. We trained our model using a custom implementation of the proximal policy optimization algorithm. The training spanned 1500 epochs, with each epoch consisting of 100 steps. We capped individual trajectories at 100 steps to maintain focused learning experiences. A discount factor (γ) of 0.99 was employed to appropriately value future rewards, and we utilized generalized advantage estimation with λ = 0.95 to balance bias and variance in advantage calculations. To promote exploration, we incorporated an entropy coefficient (β) of 0.2, which could be fine-tuned via command-line arguments.

We chose the Adam optimizer for its adaptive learning rate capabilities, with initial learning rates set at 1 × 10−3 for both the policy and value networks. To prevent explosive gradients, we implemented optional gradient clipping with a configurable maximum norm. The training progress was logged every 20 epochs, with model checkpoints saved at the same interval. We tracked average trajectory rewards and lengths to monitor the agent’s performance over time.

To quantify EMS effects on decision-making, we also fit a simplified logistic choice model to the behavioral data, with γ (aversion sensitivity) and β (bias) as free parameters. Fitting was performed separately for Stim-off and Stim-on blocks, at the session level. Estimates were obtained via maximum likelihood estimation, and differences in parameters were evaluated across sessions using bootstrapping and paired t-tests. Full model specification and results are reported in the Results section and Fig. 8a-d, f, and g.

Histology and imaging

The monkeys were deeply anesthetized with sodium pentobarbital and perfused transcardially with 0.9% saline followed by 4% (wt/vol) paraformaldehyde in 0.1 M NaKPO4 buffer (PB). Brains were blocked and stored in 25% glycerol (Sigma-Aldrich, G5516) with 0.1% sodium azide (MP Biomedicals, 0210289190) in 0.1M PB at 4°C until sectioning. The brains were then frozen in dry ice, and coronal sections were cut at 40 μm thickness on a sliding microtome. Sections were stored in 0.1% sodium azide in 0.1 M PB until use.

For immunofluorescent staining, sections were rinsed three times for 5 min each in 0.01 M PB with sodium/potassium saline containing 0.2% Triton X-100 (Sigma-Aldrich, T8787; PBS-Tx), and then incubated in blocking solution consisting of 10% normal goat serum in PBS-Tx for 1 h. The sections were subsequently incubated with primary antibody solution containing rabbit anti-GFAP [1:500] (DAKO, Z0334) in blocking solution for 24 h at 4 °C. After primary antibody incubation, the sections were rinsed three times for 5 min in PBS-Tx and then incubated for 2 h in secondary antibody solution containing goat anti-rabbit Alexa Fluor 647 [1:300] (Life Technologies, A21245) in blocking solution at room temperature. Following three additional 5 min rinses in 0.1 M PB, sections were counterstained with DAPI [1:1000] (Life Technologies, 62248) in 0.1 M PB for 2 min, then rinsed in PB for three 5 min intervals, mounted onto glass slides, and coverslipped with ProLong Gold antifade reagent (Life Technologies, P36930).

Images were captured using an AxioZoom V16 microscope (Zeiss) and exported in JPEG format.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.