Abstract
Decision confidence plays a key role in flexible behavior and (meta)cognition, but its underlying neural mechanisms remain elusive. To uncover the latent dynamics of confidence formation at the level of single neurons and population activity, we trained nonhuman primates to report a perceptual choice and the associated level of confidence with a single eye movement on every trial. Monkey behavior was well fit by a bounded accumulator model, where choice and confidence are processed concurrently, but not by a serial model, where choice is resolved first, followed by postdecision accumulation for confidence. Neurons in the lateral intraparietal area (LIP) reflected concurrent accumulation, showing covariation of choice and confidence signals across the population, and within-trial dynamics consistent with parallel updating at near-zero time lag. The results demonstrate that the primate brain can process a single stream of evidence in service of two computational goals simultaneously and suggest area LIP as a candidate neural substrate for this ability.
Similar content being viewed by others
Main
Evolution has endowed humans and some animals with the ability to assess the quality of their own decisions. This manifests as a degree of confidence, commonly defined as an internal estimate of the probability of being correct. Confidence facilitates learning in the absence of explicit feedback1 and guides decisions that are part of a sequence or a hierarchy2,3,4,5. When feedback does occur, confidence informs whether the outcome is surprising (that is, a high confidence error or low confidence correct choice), which then drives a change in learning rate6. Indeed, it has been shown that the optimal weights for converting sensory neuron activity to decision evidence can only be obtained with a learning rule that is proportional to confidence7. To understand how confidence exerts these effects on learning and sequential decision-making, and to constrain more general theories of metacognition8, it is crucial to first establish how, where and when neural representations of confidence emerge in the brain.
In the context of dynamic (evidence accumulation) models, there are three main possibilities for the relative timing of choice and confidence formation. ‘Serial’ models posit an initial phase of accumulation bearing only on the choice, followed by a secondary process that governs confidence9,10,11. Although the readout of confidence from the second process is conditioned on which choice bound was reached, it is otherwise agnostic to the primary accumulation epoch, and there is no mechanism for reading out a provisional degree of confidence when the decision is still being formed. By contrast, ‘parallel’ models propose simultaneous initiation and temporally overlapping processes for choice and confidence, with an explicit mapping between the state of the accumulator(s) at each point in time and the probability of being correct if the decision were to terminate there12,13,14. Finally, within parallel models, we can define a subcategory (‘hybrid’) as having an initial parallel phase followed by a period of postdecision accumulation that only affects confidence15,16,17,18. Of note, the serial versus parallel question is distinct from whether confidence and choice derive from separate accumulators embodying distinct transformations of the evidence16,17,19,20; thus, we return to this important issue in the Discussion.
Recent behavioral studies in humans19,21,22 offer compelling evidence for parallel computation of choice and confidence, further supported by electroencephalography20,23,24,25,26 and transcranial magnetic stimulation27 studies. Whether nonhuman animals also possess this ability for parallel computation, and how it might be implemented at the level of neuronal populations, remains unclear. We addressed this gap by training two monkeys to report choice and confidence simultaneously in a reaction-time (RT) paradigm (‘peri-decision wagering’, peri-dw), building on earlier work in human participants12,13. Notably, unlike ‘opt-out’ or ‘uncertain-response’ paradigms22,28,29,30, the peri-dw task enables direct measurement of choice, RT and confidence, and their relationship to neural activity, on a trial-by-trial basis.
While monkeys performed the task, we recorded population spiking activity in the ventral portion of the lateral intraparietal area, LIPv31. Previous work has shown that LIPv (hereafter LIP) represents a decision variable (DV) that predicts choice and RT32, as well as confidence in an opt-out task28. We found that behavior in the peri-dw task is best explained by concurrent evaluation of evidence for choice and confidence, and that neural activity in LIP reflects the requisite parallel process. The findings support a role for posterior parietal cortex in behaviors guided by an online estimate of confidence, and more broadly favor an architecture for visual metacognition that is fundamentally parallel.
Results
We recorded 407 neurons in area LIP in the right hemisphere of two rhesus monkeys (Macaca mulatta; 207 in monkey H, 200 in monkey G) while they performed the peri-dw task (Fig. 1a). Each saccade target corresponds to a motion direction judgment (left or right) and a wager (high or low) on the correctness of that judgment. Although behaviorally the task amounts to a single choice among four options, we refer to the left–right component as ‘choice’ and the high–low component as ‘wager’ or ‘bet’, both for simplicity, and because the results support this interpretation. Monkeys were rewarded or penalized based on the conjunction of accuracy and wager (Fig. 1b)—a larger drop of juice for high versus low bets when correct, and a time penalty for high bets when incorrect (no penalty for a low-bet error). As in previous work32, monkeys showed greater accuracy (Fig. 1c) and faster RTs (Fig. 1d) when the motion was strong compared to weak. Motion strength also influenced wagering behavior in a sensible manner, namely the probability of betting high increased with greater motion strength in each direction (Fig. 1e). Notably, the behavior shows that the low-bet option did not correspond to opting out or failing to engage in the motion decision—accuracy remained high on low-bet trials, and choice and RT still varied systematically with motion strength in a manner consistent with a deliberative process (see Model fitting below).
a, After the monkey acquires fixation, four targets are presented, followed by a random-dot motion (RDM) stimulus. At any time after motion onset, the monkey can make a saccade to one of the targets to indicate its choice and wager. LIP RFs (hypothetical examples shown by magenta ovals) were estimated in a separate block of memory-saccade trials. b, Table showing the possible outcomes for each trial—if correct, a high bet yields a larger juice reward compared to a low bet, but if incorrect, a high bet incurs a 2–3-s time penalty (added to the next trial’s prestimulus fixation period). Low-bet errors were not penalized. c–e, Behavioral results pooled across two monkeys (n = 216 sessions, 202,689 trials, including sessions without neural recording). Each ordinate variable is plotted as a function of signed motion strength (% Coh; negative, leftward and positive, rightward). Choice (c) and RT (d) functions are shown conditioned on the wager (low, red; high, blue). Error bars (s.e.) are smaller than the data points. Smooth curves show logistic regression (choice) and Gaussian (RT, wager (e)) fits. f, The serial model begins with a single accumulator with symmetric bounds (standard 1D DDM). Arriving at one of the bounds terminates the primary decision (h1 versus h2, or left versus right in our task) and initiates a secondary process that accumulates evidence toward a ‘high’ or ‘low’ bound governing the wager. g, The parallel model comprises two concurrent accumulators (left) that are partially anticorrelated. The first bound to be crossed (the ‘winner’) dictates the choice and RT, whereas the losing accumulator dictates confidence by way of a mapping (right) between accumulated evidence and the log odds that the choice was correct (color scale).
As expected for a behavioral assay of confidence, the monkeys’ sensitivity was greater when betting high versus low (Fig. 1c, red versus blue; z = 34.41, P = 1.65 × 10−259, s.e. = 0.2735, logistic regression). This was true even when controlling for variability in motion energy within each coherence level, by leveraging multiple repeats of the same random seed33 (Supplementary Fig. 1a (monkey H—P = 1.3376 × 10−04, z = 3.8194; monkey G—P = 3.4828 × 10−07, z = 5.0953); Methods). Both monkeys also showed faster RTs when betting high versus low, for all but the largest motion strengths (Fig. 1d (red versus blue; asterisks indicate P < 0.0045; Wilcoxon rank-sum test, Šidák corrected)). Greater sensitivity and faster RTs for high-bet choices were evident in the large majority of individual sessions, as assessed with logistic regression (Extended Data Fig. 1 (mean difference in sensitivity, monkey H—9.2599, P = 8.2899 × 10−16, z-score = 7.9646; monkey G—9.3, P = 9.4419 × 10−38, z-score = 9.4101; one-tailed Wilcoxon signed-rank test)) and Gaussian fitting (Extended Data Fig. 1 (mean difference in amplitude, monkey H— −74.4 ms, P = 1.9011 × 10−09, z-score = −5.8926; monkey G— −275.4 ms, P = 7.9589 × 10−18, z-score = −8.5203; one-tailed Wilcoxon signed-rank test)). Finally, we examined wagering behavior as a function of RT quantile, separately for each individual motion strength (Extended Data Fig. 2a,b). For most motion strengths, the monkeys were less likely to bet high on trials with longer RTs (Extended Data Fig. 2b; P < 0.0085 for both monkeys for every coherence except 51.2%, Cochran–Armitage test with Bonferroni correction). This pattern is strikingly similar to human behavior in a similar task14. As in the previous study, the trend remained statistically significant when controlling for variability in motion energy across trials of a given coherence (Extended Data Fig. 2d (monkey H—P = 3.2351 × 10−04, F = 5.24; monkey G—P = 0.0056, F = 3.65; interaction term between motion energy and RT quintile using analysis of covariance)). An inverse relationship between RT and confidence is a classic psychophysical result34,35, replicated in a more recent human work12,15,25. Observing it in our monkeys supports the peri-dw assay as a valid measure of confidence, and it is consistent with a family of accumulator models as described below.
Model-free analyses reveal temporal overlap in choice and confidence computations
Although the choice and wager were indicated with a single eye movement, this does not necessitate simultaneity in the processing of evidence. Different temporal windows of the stimulus could covertly be used to support the two elements of the decision, which would then only be reported when both were resolved. To test whether monkeys use a consistent serial strategy (resolving choice first and then confidence, or vice versa), we calculated the influence of stimulus fluctuations on choice and confidence as a function of time (psychophysical kernels36,37). Briefly, we quantified the motion energy for each trial and video frame by convolving the random-dot pattern with two pairs of spatiotemporal filters aligned to leftward and rightward motion38. We then partitioned trials by outcome and plotted the average relative motion energy (residuals) for each outcome as a function of time.
Psychophysical kernels for choice are plotted in Fig. 2a,b. Rightward choices were preceded by more rightward motion energy throughout most of the trials (red line), and the same was true for leftward choices and leftward motion (blue). The kernels for right and left choices began to separate about 100 ms after motion onset and remained so until ~100 ms before saccade initiation. This clear separation suggests that the monkeys were using essentially the entire stimulus epoch, on average, to decide motion direction. For confidence, we calculated the kernels by taking the difference between motion energy time series for high and low bets associated with a specific choice4. We found that there was an excess of rightward motion energy on rightward high-bet choices compared to rightward low-bet choices (Fig. 2c,d (green traces above zero)), and similarly, there was more leftward motion energy on high-bet versus low-bet leftward choices (Fig. 2c,d, purple traces below zero). This analysis shows that both early and late motion evidence are leveraged to inform confidence in both monkeys. Comparison of the traces in Fig. 2a,b versus 2c,d suggests that the usage of the stimulus for confidence does not identically overlap with choice, especially for monkey H. However, the substantial overlap does appear to rule out a consistent temporal segregation, such as an obligatory postdecision mechanism for confidence.
a,b, Motion energy profiles conditioned on right and left choices (red and blue, respectively), aligned to motion onset and saccade onset, shown separately for monkeys H (a) and G (b). Shaded regions indicate s.e.m. Black line at top indicates when right and left traces were significantly different from one another (P < 3.6631 × 10−4, two-sided Wilcoxon signed-rank test with Šidák correction for 140 frames). c,d, Confidence kernels for monkey H (c) and monkey G (d) computed as the difference in motion energy between right-high and right-low choices (green), and the difference between left-high and left-low choices (purple), aligned to the same events as a and b. Colored lines at top of the graph indicate when the corresponding traces were significantly different from zero (Wilcoxon signed-rank test with Šidák correction). e,h, Proportion of CoM trials that start with an initial error (light red) or correct (dark red, y axis expanded in insets) and end as correct or error, respectively, as a function of motion strength. f,i, Proportion of trials that start with an initial low wager (light blue) or high wager (dark blue) and end as high or low wager, respectively, as a function of motion strength. g,j, Proportion of trials that are CoMs (of choice, wager or both) as a function of motion strength. Insets: example eye trajectories from choice (g) and wager (j) CoM trials. Data in e–j are presented as mean ± s.e.m.; e–g = monkey H, h–j = monkey G. a.u., arbitrary units; pos., position; deg, degree.
In the previous work where decisions were reported with an arm movement13,39, participants altered their reach trajectory in a manner that suggests a ‘change of mind’ (CoM) based on the continued processing of evidence after movement initiation. Saccadic choices, being fast and ballistic, are often assumed to be incompatible with CoMs; nevertheless, we identified a small subset of trials with multiple saccades in quick succession that showed certain characteristic features (Fig. 2e–j). These putative CoMs were more frequent on difficult versus easy trials (Fig. 2g,j (monkey H—P = 0.007, z = −3.70, n = 52,993; monkey G—P = 3.1983 × 10−07, z = −31.15, n = 118,105; Cochran–Armitage test)), and changes from incorrect to correct were more likely when motion strength was high (Fig. 2e,h (light red; monkey H—P = 5.7576 × 10−05, z = 10.86, n = 8,579; monkey G—P = 1.5973 × 10−05, z = 14.13, n = 24,480; Cochran–Armitage test)). Correct-to-error CoMs, occurring sparingly, were more likely when motion strength was low (Fig. 2e,h (dark red; monkey H—P = 0.001, z = −5.88, n = 44,414; monkey G—P = 6.9991 × 10−06, z = −16.72)). We also observed changes from low to high confidence, which, for both monkeys, were more frequent with greater motion strength (Fig. 2f,i (blue line; monkey H—P = 4.9597 × 10−05, z = 11.20; monkey G—P = 0.01, z = 3.38)), as shown previously in humans13. The presence of CoMs and changes in confidence, sometimes both occurring in the same trial, imply that both aspects of the decision were subject to revision at the time of the initial saccade. This is inconsistent with a strictly serial process, although it also reveals a brief window for postdecision processing even for saccadic choices40.
Model fitting supports parallel deliberation for choice and confidence
Previous studies, with some exceptions17,41, typically assume a particular temporal framework for choice and confidence rather than comparing across model classes. Here we provide a thorough comparison of serial, parallel and hybrid models fitted to the same data from the peri-dw task. The parallel and hybrid models consist of two accumulators that integrate evidence for the two motion directions, differing only in how the accumulated evidence is mapped to confidence. To explain the wager, the confidence mapping is binarized by a single free parameter—a criterion on log odds correct associated with a high bet versus a low bet28 (Fig. 1g (right)). The hybrid model adds an additional free parameter for the duration of postdecision accumulation. For the serial model, we made the simplifying assumption9,10,11 that the two accumulators were perfectly anticorrelated, equivalent to a one-dimensional (1D) drift-diffusion model (DDM). After one of the bounds is reached, evidence continues to accumulate toward a second set of bounds dictating the wager (Fig. 1f). The observed RT is the sum of the time taken to reach both bounds, as well as nondecision time.
The smooth curves in Fig. 3 and Extended Data Fig. 3 are fits to the serial, parallel and hybrid models for both monkeys. All models perform quite well at describing choice, RT and confidence as a function of motion strength when pooled across correct/incorrect and high/low wager trials (Extended Data Fig. 3), a testament to the explanatory power of the bounded accumulation framework. Interestingly, all three models also qualitatively capture the greater choice sensitivity and faster RTs for high versus low wager (Fig. 3 (first and second columns)). This comparison illustrates the difficulty of disambiguating the mechanism(s) underlying choice and confidence using behavior alone. Indeed, quantitative model comparison yielded mixed results for the two monkeys—hybrid and parallel models were favored over the serial model for monkey G (Bayesian information criterion (BIC), hybrid = 1.1610 × 106, parallel = 1.1614 × 106, serial = 1.1644 × 106, n = 115,811), but the opposite was true for monkey H (BIC, serial = 7.7993 × 105, parallel = 7.8179 × 105, hybrid = 7.8279 × 105, n = 82,449).
a–c, Fits of the parallel model (smooth curves) to the behavioral data (filled data points; error bars are ± s.e.) from monkey G, showing the choice (a) and RT functions (b), conditioned on high-bet and low-bet trials (red and blue, respectively), and the wager function (c) conditioned on correct versus error trials (magenta and green, respectively). d–f, Same format and data as in a–c but for the serial model. g–i Same format and data as in a–c but for the hybrid model. j–r, Same format and model sequence as in a–i but fitted to data from monkey H.
Critically, however, the serial model fails in one key aspect—the pattern of wagering behavior conditioned on accuracy (Fig. 3 (right column)). It is commonly observed that confidence ratings increase as a function of evidence strength on correct trials, but decrease with evidence strength on incorrect trials. This characteristic ‘X’ pattern (or ‘folded-X’, if stimulus strength is unsigned) is widely accepted as a signature of confidence in behavior and brain activity30,42,43,44, yet it is not universal. Other studies12,13,45 report that confidence increases with evidence strength, even for errors, and this is what we observed as well (Fig. 3 (right column)). It is becoming increasingly clear that these conflicting findings can, in many cases, be explained by a temporal dissociation12,13,14,15,35—resolving a choice first, followed by confidence later, allows for revision of the confidence judgment upon further deliberation. When the stimulus is strong, incorrect choices are more likely to undergo such revision; hence, confidence decreases (on average) with evidence strength on error trials. Reducing or eliminating the delay between the choice and confidence report tends to flatten or reverse the X pattern12,13,15. Because the serial and hybrid models impose such a delay (implicitly, in our task), they cannot reproduce the qualitative trend in error–trial confidence we observed empirically (Fig. 3 (right column, green data points)), unless the postdecision epoch is very brief, as it was in the best-fitting hybrid model for monkey G (60 ms; Fig. 3i). This qualitative miss is not reflected in the above BIC results because the model likelihoods were calculated using the unconditioned wager data, meaning the split between correct and error trials (Fig. 3 (right column, green versus purple curves)) is a prediction, not a fit. Quantifying the accuracy of this prediction using the error–trial wagers establishes the parallel and serial model as the most and least supported, respectively, for both monkeys (negative log-likelihood, for monkey H—parallel = 8.4320 × 103, hybrid = 8.4370 × 103, serial = 8.6572 × 103, n = 13,176; for monkey G—parallel = 1.5636 × 104, hybrid = 1.5699 × 104, serial = 1.5993 × 104, n = 13,755).
In summary, although each model variant is flexible enough to capture most behavioral trends, a holistic model comparison favors parallel accumulation of evidence for a decision and associated level of confidence. We then examine whether decision-related activity in parietal cortex is consistent with such a mechanism.
LIP neurons show signatures of concurrent accumulation
Putative DV representations can be found in several subcortical and cortical areas, characterized by a ‘ramping’ pattern of neural activity (or decoded proxy thereof) that scales with evidence strength and often converges upon decision termination31,46,47. Although this pattern does not uniquely identify a process of evidence accumulation, a large body of work supports the assertion that LIP neurons reflect such a process during random-dot motion discrimination31,48. We reasoned that, if choice and confidence were resolved concurrently during motion viewing (parallel model), the ramping activity should begin to predict both dimensions of the eventual saccade simultaneously, classically around 200 ms after motion onset32. Alternatively, if choice was deliberated first, followed by confidence (serial model), this temporal separation should be evident in the divergence point of neural activity traces conditioned on the four outcomes.
These traces are shown in Fig. 4a for four example neurons. The highest firing rate (FR) corresponds to choices made into the receptive field (RF) of the neuron, which was almost always in the left (contralateral) hemifield but was equally likely to overlap the high or low wager target. The relative ordering of the remaining three traces differs across neurons, possibly due to idiosyncratic RF properties or nonspatial decision signals. The key observation is that the activity preceding saccades to the preferred wager target (low or high) diverges from the activity for the other wager target (high or low) at about the same time as it diverges from the traces for ipsilateral choice (right-low and right-high). This pattern is present in each example neuron and in the population averages (Fig. 4b,c). There is no evidence that ramping activity consistently predicts the left–right choice sooner than the high–low one (or vice versa), as expected under a serial model. Instead, to the extent the activity reflects accumulation of evidence favoring the target in the RF (see below), the results support a model in which such accumulation underlies concurrent deliberation toward a choice and confidence judgment.
a, FR of example units split by choice and wager outcome, aligned to motion onset and saccade onset. b, Population average FR (normalized) for neurons with an RF overlapping the left-low target. Colored bars at the top indicate when the corresponding FR is significantly less than the FR for choices into the RF (one-tailed Wilcoxon rank-sum test with Šidák correction). c, Same as b but for left-high neurons. Only low-coherence trials (0, +/−3.2%, +/−6.4%) are included in a–c. d, Theoretical autocorrelation matrix of a standard ideal accumulation process (left) and a delayed accumulator (right; Methods). e, Projection of theoretical autocorrelations for top row (gray solid) and first juxtadiagonal (gray dotted) along with the corresponding empirical data after fitting the phi parameter (Methods). Blue and red traces represent the two populations shown in b and c, respectively, pooling the data from both monkeys. Left: standard accumulator. Right: delayed accumulator. The empirical traces (blue and red) are not identical between the standard and delayed conditions because the phi parameter is fit independently (Methods).
To dig deeper into the nature of the observed ramping signals, we tested for statistical signatures of a bounded accumulation process48,49—(1) increasing variance of the underlying rate (variance of the conditional expectation, VarCE) followed by a collapse near decision termination, and (2) a characteristic autocorrelation pattern in this latent signal (correlation of conditional expectation, CorCE; Methods). The results supported both sets of predictions. Beginning 200 ms after motion onset, VarCE shows a roughly linear increase for at least the next 400 ms (Extended Data Fig. 4). For CorCE, the results from both monkeys were well-matched to the predictions, namely an increase in the correlation between neighboring time bins as time elapses, and a decrease in correlation between bins as the separation between them increases (Fig. 4d,e (left, monkey G—coefficient of determination R2 = 0.83 and 0.82 for left-high and left-low neurons, respectively; monkey H—R2 = 0.66 and 0.80 for left-high and left-low, respectively)).
These dynamics in variance and autocorrelation are consistent with an underlying neuronal mechanism that implements accumulation of noisy evidence, and are not easily explained by alternative accounts of LIP ramping activity, such as a gradual shift of attention or simple movement preparation. Critically, the patterns were present over the same time window in both the high-bet and low-bet preferring populations. This appears to refute a version of the serial model where choice is initially resolved by considering only one pair of targets, followed by a shift to the other pair after some time has elapsed. We explicitly tested this by computing the expected autocorrelation for a simulated process in which accumulation is delayed by a random amount of time. The delayed process provided a qualitatively inferior account of the empirically derived CorCE values, relative to standard (synchronous) accumulation (Fig. 4e (left versus right; left-high neurons in monkey H—P = 0.012, n = 15; left-low neurons in monkey H—P = 4.2725 × 10−4, n = 15; left-high neurons in monkey G—P = 6.1035 × 10−4, n = 15; left-low neurons in monkey G—P = 0.0015, n = 15; Wilcoxon signed-rank test)). Taken together, the results support a parallel model in which deliberation occurs simultaneously across the high and low target pairs. What remains to be tested is whether and when these accumulation signals are predictive of the monkey’s choice and wager on individual trials.
Single-trial decoding reveals links between choice and confidence signals
Most of our analyses so far have relied on trial averages, potentially obscuring the dynamics of individual decisions. Therefore, we turned to a population-decoding approach50,51 to more directly address the question of parallel versus serial deliberation. We trained two logistic classifiers, one for the binary choice and the other for the binary wager, using the population spike counts (mean = 14 units per session) in the final 200 ms before the saccade. We then extracted a ‘neural DV’, also referred to as prediction strength or certainty50,51, which is simply the log odds of a particular choice or wager as a function of time based on the decoded population spike counts on a given trial.
For both monkeys, the neural DV for choice ramped up starting about 200 ms after motion onset (Fig. 5a). The DV dynamics differed for the two animals, but both showed a ramping slope that depended on motion strength (monkey H, P < 0.001; monkey G, P < 10−4, linear regression). Cross-validated prediction accuracy also ramped up beginning at this time, simultaneously for both the choice and confidence decoders (Fig. 5b). At their peaks, both decoders performed well above chance on the test set, but a notable difference is the timing of the peaks, which for choice is just before saccade onset and for wager is slightly after the saccade (Fig. 5b). The time course of the choice and confidence DVs (Fig. 5c) mirrored the prediction accuracy traces, ramping in lockstep throughout most of the trials but with a subtle offset near the time of the saccade. This implies the persistence of a confidence-related signal even after the commitment to a wager, possibly reflecting continued deliberation (or a top-down signal) that could drive CoMs or even inform the next decision52. However, the temporal offset was absent when using an alternative decoding approach with weights from a fixed (peri-saccadic) window (Supplementary Fig. 2), so this aspect of the results should be interpreted with caution. The details of the decoding method did not affect the main result of temporal congruency in the ramping of choice and wager signals during motion viewing.
a, log odds (neural DV) quantifying prediction strength for the choice decoder as a function of time, aligned on motion onset and conditioned on motion strength. Left: data from monkey H. Right: data from monkey G. b, Prediction accuracy (proportion correct binary classification in the test set), for both the choice (gray) and wager (brown) decoders, as a function of time and aligned to motion onset and saccade onset. Shaded regions around the traces indicate s.e.m. Data are from both monkeys. Gray and brown bars at the top indicate when the accuracy for the corresponding decoder was significantly greater than chance (one-tailed Wilcoxon rank-sum test with Šidák’s correction). Black bar indicates when prediction accuracy was significantly different for choice versus wager (Wilcoxon signed-rank test with Šidák correction). c, log odds for choice and wager decoders as a function of time and aligned to motion onset and saccade onset. Line color, error shading and significance bars are similar to b. d, Angular difference between the indicated pairs of decoding vectors as a function of time. Black solid line, choice versus wager on four-target trials. Light gray dashed line, wager on four-target versus ‘wager’ on two-target control trials (vertical saccade component, collapsed across both types of control trials where only the high or only the low targets were present). Shaded regions indicate s.e.m. e, Wager decoder log odds as a function of time, on trials with the standard four-target configuration (solid trace) compared to two-target control trials (dashed trace). Shaded regions indicate s.e.m.
Stepping back from the issue of temporal alignment, an important unanswered question is the degree to which choice and confidence signals overlap on a cell-by-cell basis. We computed the correlation between the fitted decoder weights (choice versus wager), after collapsing across time and converting them to an absolute magnitude, and found a modest but highly significant correlation across our sample (Extended Data Fig. 5a (monkey H, r = 0.18; monkey G, r = 0.21; P < 0.001 for both, permutation test)). In addition, the distribution of the difference between choice and confidence weights was unimodal (Extended Data Fig. 5b (Hartigan’s dip test, P > 0.9 for both monkeys individually)), suggesting a continuum of contributions to choice and confidence and not two distinct subpopulations. This raises the question of whether the population can disentangle the two signals to prevent interference, especially since the evidence informing choice and confidence comes from a single source (the motion stimulus). To address this, we calculated the angular distance between the decoding vectors for choice versus confidence, separately for each session. During the deliberation period these vectors were approximately orthogonal (Fig. 5d (solid trace)), facilitating the readout of confidence by a downstream region, potentially at any time—although they were closest to orthogonal around the time of the saccade. Finally, to test whether concurrent choice and confidence signals could be a trivial consequence of motor preparation, we trained a separate decoder to predict the vertical component of the saccade using a set of control trials where only one pair of wager targets (either the high or low pair) was present on a given trial. The resulting decoding vector was nearly orthogonal to the vector for decoding the wager on standard four-target trials (Fig. 5d (dashed trace)), and showed a qualitatively different log odds profile (Fig. 5e), suggesting that the confidence signal is distinct from eye movement preparation in the absence of a wager decision.
Given this multiplexed representation, we wondered whether the strength of choice decoding might predict the binary classification by the wager decoder on a trial-by-trial basis. To test this, we partitioned trials according to whether the wager decoder predicted a high or a low bet (P(high) in the peri-saccade epoch greater or less than 0.5, respectively). We then averaged the DV from the choice decoder, using only 0% coherence trials, and found that it was higher for decoded-high versus decoded-low trials (Fig. 6a). This indicates that the strength with which neural activity predicts the upcoming choice covaries with the probability that the same population predicts a high bet, consistent with a tight functional link between choice and confidence signals in LIP.
a, Neural DV from the choice decoder as a function of time, aligned to saccade onset for leftward (contralateral) choices and 0% coherence trials only. Traces are separated by whether the wager decoder predicted a high (purple) or low (gold) bet. Shaded regions are s.e.m. Black bar at the top indicates a statistically significant difference between the traces (one-tailed Wilcoxon signed-rank test with Šidák correction). b, Same as a but for rightward (ipsilateral) choice trials. c, log odds from the choice decoder (leftward choices only) as a function of the probability of a high bet predicted by the wager decoder, based on the time window from −0.2 s to 0.1 s relative to the saccade. Each dot is an individual trial and the black line is a linear regression. Unlike a, purple and gold represent the behavioral wager outcome and not the decoder prediction. d, Same as c but for rightward choice trials. e, Corrected R2 values from a linear regression relating trial-by-trial choice decoding strength and wager decoder probability (P(high)), as a function of time lag between them. Values are computed using the time window 0 to 0.4 s from motion onset (MO). The s.e. is indicated by the shaded regions (barely visible). f, Same as e but using the window −0.4 to 0 s relative to saccade onset (SO). g, Same as e but using the window −0.2 to 0.2 s from saccade onset.
Remarkably, this link was only present for leftward (contralateral) and not rightward (ipsilateral) choices (Fig. 6a versus 6b). To further investigate this stark contrast, we performed a trial-by-trial analysis of the neural DVs centered around the saccade epoch (Fig. 6c,d). After separating the data by the monkey’s wager on each trial (high, purple; low, gold), we confirmed that the wager decoder strongly predicted the behavioral confidence report, irrespective of choice (Fig. 6c,d (top histograms); P < 10−250, z = 56.5160, confidence interval (CI) = (3.554 .3809) and P = 1.139 × 10−188, z = 33, CI = (0.1769 0.1993)). However, the wager prediction was only positively correlated with choice strength for leftward (contralateral) choices (Spearman rank correlation, \(\rho\) = 0.22, P = 5.287 × 10−22 and \(\rho\) = −0.093, P = 5.212 × 10−5 for contra and ipsi choices, respectively). Because LIP RFs are mostly contralateral, this means the neurons that represent the unchosen option (associated with the ‘losing accumulator’ in the behavioral model) do not show a relationship between choice strength and wager prediction, although they still predict the wager itself (Fig. 6d (top histograms)). We considered an alternative model that reads out confidence from the winning accumulator18, but this failed to capture the differences in accuracy and RT conditioned on the wager (Supplementary Fig. 4), unless there was at least ~80 ms of postdecision accumulation for confidence. However, this, in turn, predicted a prominent folded-X pattern (Supplementary Fig. 5f,l) that was absent in the data (see Discussion).
Having established a link between choice decoding strength and wager prediction (at least for contralateral choices), we can now examine the details of this relationship at a finer time scale and revisit the temporal offset shown in Fig. 5b,c. We fit a linear regression model relating choice decoder strength at time t to decoded wager probability at time t + ∆t, where ∆t ranges from +/−200 ms. During the deliberation phase (200–600 ms after motion onset and 0–400 ms before saccade onset), the strongest relationship between choice strength and wager probability was at a time lag of zero (Fig. 6e,f (corrected R2 = 0.132 and 0.216, respectively)). This result held even when the decoding vectors were realigned to be fully, not just approximately, orthogonal (Supplementary Fig. 3). Interestingly, the period centered around the saccade (−0.2 ↔ 0.2 s; Fig. 6g) gave rise to two peaks, one at zero lag (corrected R2 = 0.124) and the other at a lag of −0.2 s (choice preceding wager; corrected R2 = 0.125). We speculate that this late peak may indicate a re-evaluation of evidence informing the wager, similar to replaying the last few samples used for the choice as a substitute for external input. Regardless, the main takeaway is the prominent peak at zero lag, which is consistent with the near-simultaneous updating of internal representations guiding a decision and confidence judgment—a surprising result given the evidence for serial bottlenecks in many cognitive processes53,54,55,56.
Discussion
The neurophysiological basis of metacognition has become more accessible in recent years through the development of behavioral assays of confidence in nonhuman animals28,29,42,57,58. A longstanding goal is to connect the rich literature on process models for confidence with their implementation at the level of neural populations and circuits. One approach considers how decision accuracy, speed and confidence can be jointly explained within the dynamic framework of bounded evidence accumulation9,12,17,35, an idea anticipated by the ‘balance-of-evidence’ hypothesis described in ref. 59. Such a framework is motivated by the critical role of response time in psychophysical theory and experiment60 and its strong empirical link to confidence going back at least a century34.
Embracing a dynamic model still leaves open questions about the temporal relationship between choice and confidence computations. Several authors have emphasized postdecisional processing61,62, typically formalized by serial models where evidence is integrated for confidence only after the primary decision is terminated9,10,11. This idea follows naturally from the definition of confidence as the estimated probability correct conditioned on a choice63, and it is sensible to exploit any additional information acquired (or generated internally) after initial commitment. However, extending the accumulation process costs both time and cognitive effort64, and there are advantages to maintaining a provisional degree of confidence during decision formation14,65. Most decisions evolve in the context of ongoing or planned actions, executed with a degree of vigor proportional to expected utility66, which in turn depends on predicted accuracy (that is, confidence). Recent work also suggests a role for confidence in strategically modulating the ongoing decision process, including adjusting the termination criteria19 and adapting to rapid changes in evidence strength20,67. Finally, considering the role confidence has in sequential and hierarchical decisions4,5, computing confidence in parallel with each decision should make such sequences more efficient68.
Although our study was not designed to directly probe such uses for ‘online’ confidence, it adds to the growing evidence that such a representation is generated and available during formation of the decision19,20,21,23,24,25,27. It is notable that this signal is present within the same sensorimotor populations engaged in decision formation, raising the possibility of a local mechanism for confidence-gated changes in choice bias52 or the tuning of sensory weights during perceptual learning7,69. Confidence computations have also been extensively linked to prefrontal, orbitofrontal and cingulate cortices8,26, implying that confidence may be ‘fed back’ to sensorimotor cortices like LIP. This could, for example, explain the subtle temporal offset in our choice and wager decoders near decision termination (Figs. 5b,c and 6g), but seems harder to square with their tight alignment throughout most of the decision epoch (Fig. 6e,f). It would be interesting to connect these phenomena to recent work suggesting that a separate ‘confidence accumulator’ exerts control over the primary decision (or ‘motor’) accumulator19,20. Other accounts propose that confidence is derived from an estimate of decision reliability or ‘meta-uncertainty’41,70,71, or governed by an accumulation of evidence bearing on stimulus detectability or discriminability16,17. A combination of large-scale recordings and causal manipulations will likely be necessary to develop a mechanistic account that unifies these observations and explains why the brain computes such a diverse and distributed array of metacognitive signals.
We found that the strength of the decoded choice prediction was correlated with the probability of a high bet predicted by the wager decoder (Fig. 6a,c). This is important because it suggests a direct link between the DV representation and readout of confidence, as predicted by models instantiating a so-called ‘common mechanism’13,28,35. However, the relationship held only for contralateral choices (Fig. 6b,d), that is, for trials where the recorded neurons are presumed to represent the winning accumulator in a race model. Superficially, this seems to contradict a basic tenet of such models12, namely that the losing accumulator implements the mapping between accumulated evidence and probability correct, because the winner is always at the bound at the time of the decision. How can we reconcile this apparent conflict between the model (which posits a losing-accumulator mechanism) and the neural evidence supporting a stronger link with the winner?72 An added complexity is that our four-target task presumably entails competition between four spatially defined neuronal pools, in contrast to the two competing accumulators of the model. This exemplifies the gap between the level of explanation provided by psychological models and the implementation of a given confidence-guided behavior. Bridging this gap will require additional modeling and simulation, ideally constrained by the dynamics and empirical correlation structure across subpopulations73 as well as the heterogeneity of functional cell types that may be hidden in population averages72,74.
A limitation of the current study is that confidence is mapped onto a stable motor action, specifically the saccade to a high or low target, whose positions do not change within a session. We found this to be necessary for achieving consistent behavioral performance, but it does present interpretational challenges related to the overlap of cognitive and motor-planning signals. Could the results be explained solely by motor preparation to one of four independent spatial targets? We do not think so. First, the observed signatures of evidence accumulation (Figs. 4e and 5a) suggest that ramping activity reflects more than mere saccade planning. Second, the population activity pattern predicting the wager was distinct from the pattern preceding the same saccade on control trials when only one pair of targets was available (Fig. 5d,e). This does not mean the results are incompatible with embodied or ‘intentional’ theories of decision-making75,76, quite the opposite. We interpret the overlap of metacognitive and premotor signals as supporting and extending the idea of an ‘intentional framework’ for decisions among actions, characterized by the continuous flow of information to the motor system77. In this context, it is intriguing that two physical dimensions of a motor plan (horizontal and vertical) can be updated simultaneously based on two distinct transformations of the input (a categorical judgment and a prediction of accuracy). These were by no means guaranteed to be computed in parallel78; in fact, a recent study of two-dimensional (2D) decisions using a similar target configuration55 suggests that there is a bottleneck preventing simultaneous incorporation of two evidence streams into a single DV. Evidently, there is no such bottleneck for a single evidence stream informing choice and confidence, at least as far as we can resolve with neural decoding.
Our study reveals a type of joint representation of choice and confidence in a sensorimotor region. Notably, even early visual cortical neurons carry information about uncertainty79,80 and predict subjective confidence81,82. A key question is how sensory activity is read out in a feedforward manner to update premotor and higher-order metacognitive representations. On the other hand, feedback onto sensory cortices has been linked to a form of ‘belief’ in the context of perceptual inference83,84, but whether this computation is related to the explicit sense of confidence and its neural correlates is unclear. Future efforts to bridge the feedforward–feedback dichotomy could help uncover general principles by which population dynamics establish internal belief states, while simultaneously generating adaptive behavior in uncertain and changing environments.
Methods
Subjects and experimental procedures
Two male rhesus monkeys (M. mulatta, 6 and 8 years old, 8 and 10 kg, respectively) were handled according to the National Institutes of Health Guide for the Care and Use of Laboratory Animals and the Institutional Animal Care and Use Committee at Johns Hopkins University (protocol PR18M02). Standard sterile surgical procedures were performed to place a polyether-ether-ketone recording chamber (Rogue Research) and titanium head post under isoflurane anesthesia in a dedicated operating suite. The recording chamber was positioned over a craniotomy above the right posterior parietal cortex of both animals for access to the intraparietal sulcus and posterior third of the superior temporal sulcus. The chamber and head post were secured in place using dental acrylic, anchored with ceramic bone screws.
Experimental apparatus
Monkeys were seated in a custom-built primate chair in a sound-insulated booth facing a visual display (ViewPixx, VPixx Technologies; resolution 1080 × 960, refresh rate 120 Hz; viewing distance 52 cm) and infrared video eye tracker (Eyelink 1000 Plus, SR Research). Experiments were controlled by a Linux PC running a modified version of the PLDAPS system85 (version 4.1) in MATLAB (version 2016b, The MathWorks). Visual stimuli were generated using Psychophysics Toolbox 3.0 (ref. 86). For correct responses, the monkey was given a fluid reward that was dispensed using a solenoid-gated system.
Neurophysiology
Recording probes (32-channel or 128-channel Deep Array, Diagnostic Biochips) were positioned with the aid of a polyether-ether-ketone grid secured inside the recording chamber. A sharpened guide tube was inserted through a grid hole so that the tip of the tube just punctured the dura, then a probe was advanced through the guide tube into the brain using a motorized microdrive (40-mm MEM drive, Thomas Recording). Bandpass-filtered voltage signals were collected using the Open Ephys acquisition board and software87 (versions 0.5.2 and 0.5.3). Post hoc analysis for identifying single neurons and multi-unit clusters was done using Kilosort 2.0 (ref. 88) with additional curation using phy2 software (https://phy.readthedocs.io/en/latest/). Data analysis was performed with custom MATLAB code.
Targeting of LIPv was achieved by selecting grid locations based on a postsurgical structural magnetic resonance imaging scan, in which the chamber and grid holes were well-visualized. We compared the magnetic resonance images to published reports and atlases89 to estimate the depth of LIPv (typically 8–12 mm from the dura in our vertical penetration angle) and corroborated the targeting using white–gray matter transitions and physiological response properties during the mapping tasks described below. After reaching the target, we let the probe settle for 30–60 min before the start of the experiment. A total of 407 units (single = 148 and multi = 56 in monkey H; single = 107 and multi = 93 in monkey G) were collected over 29 sessions (12 for monkey H, 17 for monkey G). No qualitative differences were detected in the results when comparing single and multi-units, so they were pooled for all analyses.
Memory-saccade task
Sessions began with a standard memory-guided saccade task to identify neurons with spatially selective activity during the delay period90 and to coarsely map their RFs. Monkeys were instructed to gaze at a central fixation point (1.5° radius acceptance window), after which a red target (0.42° diameter circle) was flashed for 100 ms, located in one of several locations evenly spaced in polar coordinates. The coordinates consisted of three different radii (eccentricities) and either 10 or 12 angular positions, resulting in a total of 30 or 36 unique target locations. Each target location was presented 10 times in pseudorandom order, requiring a total of 300 or 360 trials. While fixating, the monkey had to remember the location of the target, and after a delay of 0.8 s the fixation point was extinguished, instructing the monkey to make a saccade to the remembered location. The RFs were estimated online during/after the memory-saccade block by acquiring multi-unit spikes (threshold crossings) on each recording channel and plotting the mean FR during the memory delay as a function of target location in a 2D heat map. These RF maps guided the placement of the four targets for the main decision task, ensuring that at least one target overlapped with the RF of multiple neurons in the recorded population. Neurons whose RFs did not overlap with any target were excluded from further analysis.
Main task
The monkeys were trained to perform an RT direction discrimination task with simultaneous report of choice and confidence (‘peri-decision wagering’; Fig. 1a). To initiate a trial, the animals acquired fixation on a target at the center of the screen (0.21° diameter). After a delay of 0.5-s four targets appeared, positioned diagonally from the center of the screen, each representing a choice (left or right) and a wager, or bet (high or low). The targets representing high (low) bets were always placed in the upper (lower) quadrants (mean ± s.d. of (x, y) target positions relative to fixation—left-high = −7.3 ± 2.1°, 7.3 ± 1.5°; right-high = 7.3 ± 2.1°, 7.3 ± 1.5°; left-low = −6.9 ± 2.3°, −3.4 ± 1.4°; right-low = 6.9 ± 2.3°, −3.4 ± 1.4°). Each left–right pair was presented symmetrically around the vertical meridian, but high-bet targets were typically 2–5° more eccentric than low-bet targets to counteract the monkeys’ tendency to bet high more often than low.
After another brief delay (0.3–0.6 s, truncated exponential), a dynamic RDM stimulus was presented in a circular aperture. Motion strength, or coherence, was sampled uniformly on each trial from the set (0%, ±3.2%, ±6.4%, ±12.8%, ±25.6%, ±51.2%), where positive values indicate rightward motion and negative values indicate leftward motion. The stimulus was constructed as three independent sets of dots32, each appearing for a given video frame, then reappearing three frames (25 ms) later. Upon reappearing, a given dot was either repositioned horizontally to generate apparent motion in the assigned direction (speed = 2–16° s−1, held constant within a session) with probability given by the coherence on that trial, or otherwise was replotted randomly within the aperture.
When ready with a decision, the animal could report its choice and wager by making a single saccade to one of the four targets. When the eyes moved 1.5° away from the target, the RDM and fixation point were extinguished while the four targets remained visible. When the eye position reached one of the four targets, it was required to hold fixation within 1.5° of the target for 0.1 s to confirm the outcome. Finally, the animal was either rewarded or given a time penalty depending on the conjunction of accuracy (choice corresponding to the sign of coherence) and wager (Fig. 1a (right)). The penalty for a high-bet error was applied to the subsequent trial, where the animal was required to fixate the central target for a longer period of time (2–3 s) before RDM onset. Reward sizes (~0.21 ml for high bets and ~0.19 ml for low bets) and penalty times were chosen to encourage a wide range of wager frequency across different levels of motion strength.
Quantification and statistical analysis
Cell selection
For consistency with previous work, we limited the analyses shown in Fig. 4 and Extended Data Fig. 4 to units with spatially selective persistent activity, based on the memory-saccade task. We quantified spatial selectivity using a discrimination index91:
where Rmax and Rmin are the mean FRs during the delay period at the target location with the highest and lowest response, respectively. SSE is the sum-squared error around the mean responses, n is the total number of trials and M is the total number of unique target locations. For Fig. 4 and Extended Data Fig. 4 (all analyses except the population logistic decoder), only units with DDI > 0.45 were included (n = 195). This criterion is somewhat arbitrary but was based on the qualitative inspection of numerous individual RF maps and chosen before designing and executing the main analyses.
For assigning units a preferred target location in the peri-dw task, we quantified RFs offline by fitting a 2D Gaussian to the FRs during the delay period of the memory-saccade task:
where FR is firing rate during the delay period and x is a two-length vector that contains the x and y coordinates of the memory-saccade targets. The fitted parameters include A for amplitude, µ for the position of the RF center (x and y in degrees) and Σ that is the 2 × 2 covariance matrix of the Gaussian. To reduce the variables for fitting, we set the covariance to 0 and only fit for the two variances. We then normalized the 2D Gaussian to convert it to a probability density and defined a unit’s preferred location based on which of the four choice-wager targets were associated with the highest probability.
For the population logistic decoders, we included all well-defined single neurons or multi-unit clusters based on careful spike sorting and manual curation. Previous studies have suggested that decision-related activity can be present even in neurons without spatially selective persistent activity74, and we aimed to maximize the sample size to facilitate single-trial decoding. This broader criterion increased the average number of units per session from 7.5 to 17.3 for monkey H and from 6.2 to 11.2 for monkey G.
Behavioral data analysis
We applied a logistic regression model to fit the proportion of rightward choices:
where Pright is the probability of a rightward choice, Coh is signed motion coherence, Wager is the monkey’s bet (high/low), β0 is the overall bias, β1 estimates choice sensitivity, β2 captures any bias related to the bet (typically near zero) and β3 is the interaction term used to test whether choice sensitivity depends on wager. Fitting was done by minimizing the negative log-likelihood under a binomial distribution, using ‘fminsearch’ in MATLAB with the Nelder–Mead method.
We used a modified Gaussian function to provide descriptive fits of the mean RT as a function of motion strength, as follows:
where \(A\) is an amplitude term, Coh is the signed motion coherence, \(\sigma\) is the s.d. controlling the width of the Gaussian and \(b\) is a baseline term capturing the fastest mean RT. We fit this pseudo-Gaussian by minimizing the root-mean-square error.
To examine the relationship between accuracy and RT, as well as wager and RT, we calculated proportion correct and proportion of high bets grouped by RT, using nonoverlapping 100 ms time bins starting 100 ms after motion onset. To test for significance as to whether the trend was decreasing (during the time depicted in Fig. 1d), we used a Cochran–Armitage test for trend:
where \({t}_{i}\) are the weights depicting the trend (in our case linear, so t = (0, 1, 2, 3, 4, …)), n is total number of trials, nab is total number of trials for group \(a\) (accuracy—correct and error; wager—high or low) and \(b\) is total number of trials for time point \(b\). \({R}_{a}\) represents total number of trials for group \(a\) irrespective of time, and \({C}_{b}\) is total number of trials at time point \(b\) irrespective of group. The division of \(T\) by \(\sqrt{\mathrm{Var}(T)}\) gives a test statistic, z, that can then be used to compute a P value.
Motion energy analysis
To estimate the temporal weighting of sensory evidence for choice and confidence, we used motion stimulus fluctuations to perform a psychophysical reverse correlation analysis on both choice and wager. We convolved each trial’s sequence of dots—a 3D array with the first two dimensions denoting the x and y coordinates of each dots, and the third dimension spanning the number of frames—with two pairs of quadrature spatiotemporal filters38. The filters were oriented to capture motion along the choice axis—0° (rightward) and 180° (leftward). The convolved quadrature pairs were squared and summed to give the local motion energy for leftward and rightward directions. These local motion energies were then collapsed across space (first two dimensions) to derive the rightward and leftward motion energy provided by the stimulus through time, which we then subtracted (right minus left) to obtain the net motion energy36,38. To strictly examine the fluctuations around the mean and mitigate potential effects of coherence, we subtracted the mean motion energy from each trial, conditioned on signed coherence, over time. As most meaningful fluctuations occur at low coherences, we also restricted the analysis to 0%, 3.2% and 6.4% coherence trials.
For the choice kernels (Fig. 2a,b), we simply averaged the net motion energy time series for all trials with a right or left choice. To evaluate statistical significance, we used a Wilcoxon signed-rank test comparing the motion energy profiles for right and left choices at each time point, applying a Šidák correction for the number of time samples. For the confidence kernels (Fig. 2c,d), we instead subtracted the motion energy profiles for high versus low wager trials conditioned on a given choice (correct trials only) and computed the s.e. of the difference (Fig. 2c,d (shaded area around traces in bottom)). To assess significance, we compared the confidence kernel distribution relative to a value of zero using a Wilcoxon rank-sum test with Šidák correction for the number of time samples.
Changes of mind (CoMs) and changes of confidence
Raw eye position data (sampled at 1000 Hz) were converted to velocity and smoothed by applying a third-order low-pass Butterworth filter with a cutoff frequency of 75 Hz. Eye position is a 2D vector containing x and y positions in degrees, while eye velocity is defined as a 1D vector that combines the velocities for both directions (x and y). To preprocess the data, we first calculated a stricter time for saccade onset by applying a threshold of 20° s−1 onto the smoothed velocity data. Subsequently, we centered the eye positions by subtracting the average of the last 5 ms before saccade onset. The initial choice and wager were determined by the quadrant of the screen where the eyes were located 5 ms after saccade detection. The final choice and wager corresponded to the target at which the eye position settled within a 0.1-s grace period after the initial saccade. This provided an initial and final choice and wager for every trial, allowing for simple analyses like those in Fig. 2e–j. The CoM frequencies (Fig. 2e,f,h,i) were conditioned on the initial outcome; therefore, the frequencies reflect not the proportion of all trials but only trials that initially reached a given outcome (error, correct, low and high). For Fig. 2g,j, the probabilities reflect all trials; hence, the values are much smaller than on the other plots.
Parallel model
To formalize the hypothesis of parallel deliberation for choice and confidence, we used a 2D bounded accumulator model12, also known as an anticorrelated race. We adapted a recently developed family of closed-form solutions for a 2D correlated diffusion process92 to facilitate fitting of the parameters. By conceptualizing the diffusion process as a Gaussian distribution originating from the third quadrant on a plane with two absorbing bounds, one can use the method of images to calculate the propagation of the probability density of the diffusing particle, that is, the solution to the Fokker–Planck equation. The constraint making this numerical solution possible limits the discrete number of anticorrelation values that can be modeled, governed by the number of images:
where \(\rho\) is the correlation value and \(I\) is the number of images. We selected \(I=4\) or \(\rho =0.7071\) for consistency with previous studies12,13.
Specifically, the method of image yields \(P\left({v}_{\mathrm{right}},{v}_{\mathrm{left}}\right|C,t)\), describing the probability of the accumulator being in a particular position at time \(t\) for coherence \(C\). The probability of making a right choice is given by:
where \(B\) is the bound for terminating the accumulation process. To obtain the decision time (DT) distribution, we calculated the difference in the survival probability as follows:
thereby providing the change in probability at each survival timestep, that is, the probability of crossing a bound. The RT is equal to the DT plus sensory and motor delays, referred to as nondecision time or ‘nonDT’, and is obtained by convolution as follows:
where \({P}_{\mathrm{non}\mathrm{DT}}\left(z\right)\) is modeled as a Gaussian distribution with mean \(\left({\mu }_{\mathrm{non}\mathrm{DT}}\right)\) and s.d. \(\left({\sigma }_{\mathrm{non}\mathrm{DT}}\right)\), and z is a time variable indicating the epoch contributing to the convolution.
To calculate the probability of betting high, we first computed the log odds of a correct choice as a function of the state of the losing accumulator12, as follows:
where \({v}_{\mathrm{incorrect}}\) is the incorrect accumulator (not matching the sign of coherence) and \({v}_{\mathrm{correct}}\) is the correct accumulator (matching the sign of coherence). This transformation provides a graded scale for confidence that can be transformed into (high, low) wager responses by applying a cutoff value \((\theta )\) that imposes binary outcomes. To obtain the probability of a high bet, we computed:
where \(\log \,\mathrm{odds} < \theta\) indicates integration over the area in which \(\log \,\mathrm{odds}\) is less than the cutoff value.
We found that the fits were improved with two subtle changes from the model used previously12,13, although they did not affect the conclusions regarding serial versus parallel. The first is an extra parameter (\({\tau }_{\mathrm{cut}}\)) that acted as a cutoff marking when confidence no longer depended on elapsed time but only on the amount of evidence accumulated. This relaxes the assumption of an optimal mapping from DV to confidence, which is time-dependent due to marginalization over a mixture of motion strengths28. Second, having observed occasional CoM (Fig. 2e–j), we considered the possibility that the brief postdecision epoch that mediates CoMs also allows some temporal flexibility in the assignment of confidence. Specifically, we found that the wager-conditioned RT distributions were better fit when conditioning on the wager that would have been made at the end of the nondecision time, rather than at DT. This result defies simple interpretation under the current modeling framework, but it may point toward an interesting target for more complex models in future work.
We obtained the best-fitting parameters to the model by using the joint probability of choice, RT and wager as follows:
We calculated the negative log-likelihood of this by:
The likelihood calculation thus optimizes for choice and RT conditioned on wager (Fig. 3 (left and middle columns)), but the wager itself is unconditioned, such that the split between correct and error trials in Fig. 3 (right column, green versus purple) is a prediction, not a fit. The full model had eight or seven free parameters, differing slightly between the two animals. These included drift rate \(\left(K\right)\), bound \(\left(B\right)\), mean nondecision time \(\left({\mu }_{\mathrm{non}-\mathrm{DT}}\right)\), confidence time cutoff (\({\tau }_{\mathrm{cut}}\)), wager offset (\({b}_{w})\) and log odds cutoff \((\theta )\). Monkey H required separate nondecision time means for left and right choices, and a ‘wager offset’ \(({b}_{w})\) capturing a tendency to occasionally bet low across the board, even for the highest coherences (akin to a lapse rate). In contrast, monkey G tended to ‘lapse’ (bet low more often than expected) only for weaker motion strengths, requiring an ad-hoc gain factor applied to the \(P\left(\mathrm{high}\right)\) distribution in a coherence-dependent manner (parameterized by \(\mathrm{gain}=1+\alpha \times (1-\epsilon \times \left|\mathrm{coherence}\right|)\). Nondecision time s.d. \(({\sigma }_{\mathrm{non}\mathrm{DT}})\) was not a free parameter and was instead established using psychophysical kernels13. Fitting was done using MATLAB’s built-in function ‘fminsearch’ applied with a grid-search method with 30 different starting points.
Serial model
The serial model was constructed as a sequence of two 1D DDMs, one for choice followed by a second for confidence (Fig. 1f). Following are the six main parameters: drift rate \(\left(K\right)\), choice bound \(\left({B}_{c}\right)\), high wager bound \(\left({B}_{h}\right)\), low wager bound \(\left({B}_{l}\right)\), mean nondecision time \(\left({\mu }_{\mathrm{non}\mathrm{DT}}\right)\) and a linear urgency signal for only the confidence accumulator \({(u}_{M,\mathrm{Conf}})\). Parameter estimation largely followed the same logic as the parallel model, using the joint distribution of choice, RT and wager to fit the data, with a few minor differences. Computing the probability of rightward choice was similar to the parallel model and used the following formula:
where \(v\) is the DV for the first 1D accumulator, and the other parameters are similar to the parallel model. The DT distribution was calculated as:
where \({P}_{\mathrm{choice}}\) is the probability of hitting a bound at time t for the first accumulator and \({P}_{\mathrm{wager}}\) is the probability of hitting a bound at time t for the second accumulator. For RT, we followed the same procedure as the previous model. Finally, to calculate the probability of betting high, we used the following equation:
Hybrid model
The hybrid model was formalized as a 2D race, thereby also using the closed-form solution92. Given the similarity to the parallel model, all variables were computed using the same formulae, except for an additional convolution step to compute the distribution of DT, as well as the extra time used for the wager. We defined ‘wager time’ (WT) as the period that the losing accumulator is allowed to continue after the winning accumulator reaches its bound. The total DT is formulated as:
where DT represents the time for both choice and wager and WT strictly represents the extra time for confidence. The RT is then computed by convolving \(P(\mathrm{DT}+\mathrm{WT})\) with the nondecision time, as in equation (9). Subsequent computation of Pwager and its dependencies are computed using \(P(\mathrm{DT}+\mathrm{WT})\) instead of \(P(\mathrm{DT})\).
Analysis of neural data based on the preferred and chosen target
Two populations of neurons were established based on the RF overlap of either the left-high or left-low target. We computed four normalized FR responses for both populations of neurons, where each response corresponds to one of the four chosen targets (Fig. 4b,c). To combine across neurons, we first detrended the responses of each neuron over the coherences of interest (−6.4%, −3.2%, 0%, 3.2%, 6.4%). Then the FRs for each neuron (after smoothing each trial with a 0.1-s exponential filter) were averaged over all trials that met the following three conditions: (1) matched the chosen target of interest, (2) contained a coherence of interest and (3) had an RT longer than 0.3 s. Example units in Fig. 4a were not detrended or normalized, but otherwise the procedure was the same. The colored bars at the top of Fig. 4b,c indicate statistical significance based on a one-tailed Wilcoxon rank-sum test evaluating whether the activity preceding a choice of the RF-aligned target was greater than each of the other three targets, indicated by the color. Given that the testing is done over multiple time points, the one-tailed Wilcoxon rank-sum test α value was adjusted using Šidák’s correction.
Signatures of accumulator dynamics in FR variance and autocorrelation
An accumulation of noisy evidence produces characteristic variance and autocorrelation features that can be estimated from single neurons48,49. Applying the law of total variance to a doubly stochastic process, the variance in spike rate for a given time bin is a summation of the variance of the underlying latent rate, termed VarCE, and the residual variance expected if the latent rate was constant, known as point process variance (PPV). To calculate VarCE, one must subtract out the PPV from the total measured variance. To do this, we made the following two standard assumptions49: (1) the observed spiking of a neuron follows a stochastic point process mediated by some rate parameter, and (2) at each time bin the PPV is proportional to the mean count:
Here \({Y}_{i}\) represents the random variable capturing the neuron’s spike count at time point i, \({X}_{i}\) is the random variable for the latent rate at time point i and \(\varphi\) is a constant that is fitted to maximize how well the observed FRs match an accumulation of independent identically distributed (IID) random numbers. \(E\left[{n}_{i}\right]\) is the mean spike count at time point i. In addition, it follows that the law of total covariance is described using a similar equation:
The first term on the right-hand side is known as the covariance of conditional expectations (CovCE), which is needed to compute the CorCE. The second term is the expectation of conditional covariance, and its diagonal is the PPV. To calculate the CorCE, we made another assumption that when \(i\ne j\), the expectation of conditional covariance is zero because the variance from the point process should be independent across time points49 (although this may not be strictly true for adjacent time bins due to their shared interspike interval). This simplification makes it possible to state that the CovCE, for \(i\ne j\), is equal to the measured covariance, and the diagonal of the CovCE is then the VarCE. It follows that, to calculate the CorCE, one must simply divide the CovCE by the VarCE:
where i and j are time points. The best-fitting φ is then calculated by comparing the empirical CorCE estimates with theoretical or simulated correlation values under the hypothesized generative process (for example, accumulation of IID samples).
We tested two theoretical autocorrelation patterns, one pertaining to a standard drift-diffusion process and the other to a delayed drift-diffusion process (Fig. 4d). The standard accumulation of IID random numbers was calculated using the following equation:
We used six different time points, giving 15 unique combinations \((\)i = 1:6 and j = 1:6, i ≠ j\()\). For the delayed accumulation of IID random numbers, we used a simulation that accumulated noisy normalized samples of numbers with mean (0.717, 0, –0.717). We narrowed the simulation to the first six timesteps to compare it to the results from the standard accumulation process. In addition, the delay component was constructed by uniformly sampling a value between 1 and 6, indicating when the accumulation would begin. Using 10,000 trials, we calculated the autocorrelation of the first six timesteps for this simulation, providing 15 unique combinations \((\)i = 1:6 and j = 1:6, i ≠ j\()\). In addition, we fit the \(\varphi\), for both models, according to the following steps: (1) calculate the \(E\left[{n}_{i}\right]\), \(\mathrm{Var}\left({Y}_{i}\right)\) and \(\mathrm{Cov}\left({Y}_{i},{Y}_{j}\right)\) from observed spikes; (2) compute \(\mathrm{VarCE}({Y}_{i})\) using an initial value of \(\varphi =1\); (3) calculate \(\mathrm{CorCE}({Y}_{i},{Y}_{j})\) under the assumptions mentioned above; (4) calculate the mean squared error (MSE) between the empirical \(\mathrm{CorCE}({Y}_{i},{Y}_{j})\) and the theoretical/simulated autocorrelation values, \({\rho }_{i,j}\); and (5) iteratively update \(\varphi\) until the MSE between the \(\mathrm{CorCE}({Y}_{i},{Y}_{j})\) and \({\rho }_{i,j}\) reached the global minimum.
We used 6 × 60-ms time bins spanning from 150 to 170 ms and from 510 to 530 ms after motion onset. We applied the analysis on trials with coherences (of –6.4%, –3.2%, 0%, 3.2%, 6.4%) and RTs of at least 630 ms to minimize bound effects. To combine across neurons, we calculated the mean response for each time bin of each neuron across all trials and subtracted that from the mean response for each time bin conditioned on the signed coherence. This gives a matrix of residuals of size ([Nneuron × Ntrials] × Ntime bins), which is then used to calculate a covariance matrix. Next, the VarCE is calculated by substituting the raw variance for the diagonal in the covariance matrix, as the diagonal represents the normalized population variance, and \(\varphi\) is initialized at a value of 1. To calculate the empirical correlation values, each entry in the covariance matrix is divided by \(\sqrt{\mathrm{VarCE}\left({Y}_{i}\right)\times \mathrm{VarCE}({Y}_{j})}\). Finally, using a fitting procedure, we compare Fisher’s z transformation of the empirical correlation with a z-transformation of the ideal correlation and minimize the MSE of the correlation matrix.
Statistical testing for differences between standard and delayed accumulation was done using a leave-one-out cross-validation method. The metric used to test the validation was the mean absolute percentage error (MAPE). MAPE allowed for comparison between the two models because both models contained different dependent variables (\({\rho }_{i,j}\)). This method provided 15 different MAPE values for each model, which were then compared using a two-tailed Wilcoxon signed-rank test. The model with the lowest percentage error distribution better captured the underlying autocorrelation structure of the data (Fig. 4e).
For Extended Data Fig. 4, we recalculated the VarCE but used 6 × 100-ms time bins spanning −50 ms before motion onset to 550 ms after motion onset. Here we instead applied the analysis on all trials irrespective of coherence. The CorCE results and preference for standard over delayed accumulation did not differ when using all the coherences. We calculated the shaded error bars using a bootstrap method with 500 resamples.
Population decoding
Data were preprocessed by calculating spike counts in 100-ms time windows, stepping every 20 ms, through the first 600 ms after motion onset, and again separately for spikes aligned to saccade onset (from 400 ms before to 200 ms after). We included only trials with RT > 400 ms to minimize edge effects that may obscure single-trial dynamics. We used two L2-regularized logistic decoders, one for choice and one for wager:
where w represents a vector of weights at time t (vector length matches number of units), \({{w}}_{0}\) is a bias term and \(X\) is the spike count vector for each unit in the session. The objective was penalized by \(\xi||w||^{2}\) with the best hyperparameter ξ determined using a 50-value grid search between 0 and 1,000, using fivefold cross-validation. The dataset was divided into a training set (90%) and test set (10%), the latter of which was used to calculate the prediction accuracy (Fig. 5b). If \({P}_{t}\) was greater than 0.5, this would indicate that the decoder predicted either a right choice (for the choice decoder) or high bet (for the wager decoder). Values below 0.5 would indicate either a left choice or low bet. The performance (accuracy) was defined based on the monkey’s choice and wager at the end of the trial. We define a ‘neural DV’50,51 (aka model DV) by computing the log odds of a particular choice (for example, \(\log (\frac{P\left(\mathrm{right}\right)}{1-P\left(\mathrm{right}\right)})\) for rightward choices) given the population spike counts up to time t on a given trial (correct trials only). Notably, for Fig. 5a,c, we used the log odds irrespective of choice; therefore, the results combine \(\log (\frac{P\left(\mathrm{right}\right)}{1-P\left(\mathrm{right}\right)})\) for right choices and \(\log (\frac{1-P\left(\mathrm{right}\right)}{P\left(\mathrm{right}\right)})\) for left choices, to increase statistical power. For Fig. 5c the log odds for wager are also combined in the following manner: \(\log (\frac{P\left(\mathrm{high}\right)}{1-P\left(\mathrm{high}\right)})\) for high wagers and \(\log (\frac{1-P\left(\mathrm{high}\right)}{P\left(\mathrm{high}\right)})\) for low wagers.
To test whether the neural DV for the choice decoder showed a linear increase that was significantly dependent on motion strength, we fit a linear regression:
where \({\beta }_{0}\) is a bias term, \(T\) is time (20-ms time bins for 200–600 ms after motion onset), \(\mathrm{Coh}\) is motion coherence level and \({\beta }_{\mathrm{1,2,3}}\) are the weights accompanying the predictor variables. If \({\beta }_{3}\) was significantly different from zero \((P < 0.05)\), then the modulation of log odds by motion strength was deemed significant. To compute this linear regression, we only used the mean \(\mathrm{DV}\) shown in Fig. 5a, excluding data cutoff at the mean RT for each individual coherence.
As mentioned in the Results, both decoders provide weights for each neuron through time. Therefore, to test whether our sample of neurons comprised a single population that contributes approximately equally to choice and wager, we calculated Pearson’s correlation between the weight magnitudes (irrespective of sign) and the distribution of the difference in weight magnitude. We collapsed over time by averaging over the first 600 ms after motion onset and the last 400 ms before saccade onset (Extended Data Fig. 5a (red line)). We tested for significance by randomly permuting the choice and wager weights 1,000 times (Extended Data Fig. 5a, blue distribution). In addition, the choice and wager decoder absolute weights were subtracted from one another to create a distribution that informs whether there is a single population equally contributing to both choice and wager, based on a test for bimodality (Hartigan’s dip test).
To determine the trial-by-trial relationship between the choice and wager decoder, we first tested whether the wager decoder was predictive of the DV (log odds). Trials were categorized as decoded-low or decoded-high confidence by calculating the mean \(P(\mathrm{high})\) for the wager decoder from 200 ms before saccade initiation until 100 ms after. Values above (below) 0.5 indicated a decoded-high (decoded-low) wager. We focused strictly on trials with 0% coherence to minimize any effects of strong coherent sensory input. Results for Fig. 6a were calculated by then averaging the DV on these decoded-high and decoded-low trials. Significance bars were calculated using a one-tailed Wilcoxon rank-sum test with Šidák’s correction.
To examine the relationship between the updating of choice and confidence at a finer time scale, we related the log odds of the choice decoder to the P(high) from the wager decoder on each individual trial, focusing on three separate 400-ms time windows as described in the legend. The independent variable, \(P(\mathrm{right})\) from the choice decoder, was corrected so that values near 0.5 (chance level) were close to zero and values moving away from 0.5 in either direction (better predicting left or right choice) became closer to one. This was done using the following equation:
In essence, this transformation changes the property of choice decoding to depend only on distance from chance, reflecting signal strength or magnitude, making it possible for a linear relationship to exist between corrected P(right) and \(P(\mathrm{high})\). To capture this relationship and its dependency on time lag, we applied a linear regression, using the corrected P(right) at time \(t\), to the dependent variable P(high) at time \(t+\Delta t\), where \(\Delta t\) ranged from +/− 200 ms. We quantified the strength of the relationship, separately for each time lag (Fig. 6e–g), as the corrected R2, which is simply the R2 value after subtracting out the average R2 values when the time series of the decoders for each trial were randomly permuted.
Statistics and reproducibility
No statistical methods were used to predetermine sample sizes, but our sample size of two monkeys is standard in the field. The number of recorded neurons per animal is also consistent with field-specific conventions and sufficient for reliable inference with the types of analyses performed. No data were excluded from the analyses except for reasons of neuronal selectivity as noted above. Nonparametric tests were used rather than assuming or testing for normality. The investigators were not blinded to allocation during experiments and outcome assessment. Experimental conditions were presented in randomized order to prevent monkeys from predicting upcoming trials.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The data that support the findings of this study are available from the corresponding author upon request. The authors are conducting further analyses of the data. The data will be made publicly available once this has been completed.
Code availability
Sample model-fitting code is available at Code Ocean https://doi.org/10.24433/CO.6135339.v1. Code needed to reproduce other figures and analyses will be made available upon request.
References
Daniel, R. & Pollmann, S. Striatal activations signal prediction errors on confidence in the absence of external feedback. Neuroimage 59, 3457–3467 (2012).
Sarafyazd, M. & Jazayeri, M. Hierarchical reasoning by neural circuits in the frontal cortex. Science 364, eaav8911 (2019).
Purcell, B. A. & Kiani, R. Hierarchical decision processes that operate over distinct timescales underlie choice and changes in strategy. Proc. Natl Acad. Sci. USA 113, E4531–E4540 (2016).
Van den Berg, R., Zylberberg, A., Kiani, R., Shadlen, M. N. & Wolpert, D. M. Confidence is the bridge between multi-stage decisions. Curr. Biol. 26, 3157–3168 (2016).
Zylberberg, A. Decision prioritization and causal reasoning in decision hierarchies. PLoS Comput. Biol. 17, e1009688 (2021).
Rescorla, R. A. & Wagner, A. R. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. in Classical Conditioning II (eds Black, A. H. & Prokasy, W. F.) 64–99. (Appleton-Century-Crofts, 1972).
Drugowitsch, J., Mendonça, A. G., Mainen, Z. F. & Pouget, A. Learning optimal decisions with confidence. Proc. Natl Acad. Sci. USA 116, 24872–24880 (2019).
Fleming, S. M. Metacognition and confidence: a review and synthesis. Annu. Rev. Psychol. 75, 241–268 (2024).
Pleskac, T. J. & Busemeyer, J. R. Two-stage dynamic signal detection: a theory of choice, decision time, and confidence. Psychol. Rev. 117, 864–901 (2010).
Moran, R., Teodorescu, A. R. & Usher, M. Post choice information integration as a causal determinant of confidence: novel data and a computational account. Cogn. Psychol. 78, 99–147 (2015).
Herregods, S., le Denmat, P. & Desender, K. Modelling speed-accuracy tradeoffs in the stopping rule for confidence judgments. Preprint at bioRxiv https://doi.org/10.1101/2023.02.27.530208 (2024).
Kiani, R., Corthell, L. & Shadlen, M. N. Choice certainty is informed by both evidence and decision time. Neuron 84, 1329–1342 (2014).
Van den Berg, R. et al. A common mechanism underlies changes of mind about decisions and confidence. eLife 5, e12192 (2016).
Khalvati, K., Kiani, R. & Rao, R. P. N. Bayesian inference with incomplete knowledge explains perceptual confidence and its deviations from accuracy. Nat. Commun. 12, 5704 (2021).
Desender, K., Donner, T. H. & Verguts, T. Dynamic expressions of confidence within an evidence accumulation framework. Cognition 207, 104522 (2021).
Maniscalco, B. et al. Tuned inhibition in perceptual decision-making circuits can explain seemingly suboptimal confidence behavior. PLoS Comput. Biol. 17, e1008779 (2021).
Hellmann, S., Zehetleitner, M. & Rausch, M. Simultaneous modeling of choice, confidence, and response time in visual perception. Psychol. Rev. 130, 1521–1543 (2023).
Le Denmat, P., Verguts, T. & Desender, K. A low-dimensional approximation of optimal confidence. PLoS Comput. Biol. 20, e1012273 (2024).
Balsdon, T., Wyart, V. & Mamassian, P. Confidence controls perceptual evidence accumulation. Nat. Commun. 11, 1753 (2020).
Balsdon, T. & Philiastides, M. G. Confidence control for efficient behaviour in dynamic environments. Nat. Commun. 15, 9089 (2024).
Dotan, D., Meyniel, F. & Dehaene, S. On-line confidence monitoring during decision making. Cognition 171, 112–121 (2018).
Li, X., Su, R., Chen, Y. & Yang, T. Optimal policy for uncertainty estimation concurrent with decision making. Cell Rep. 42, 112232 (2023).
Gherman, S. & Philiastides, M. G. Neural representations of confidence emerge from the process of decision formation during perceptual choices. Neuroimage 106, 134–143 (2015).
Balsdon, T., Mamassian, P. & Wyart, V. Separable neural signatures of confidence during perceptual decisions. eLife 10, e68491 (2021).
Dou, W. et al. Neural signatures of evidence accumulation encode subjective perceptual confidence independent of performance. Psychol. Sci. 35, 760–779 (2024).
Goueytes, D. et al. Evidence accumulation in the pre-supplementary motor area and insula drives confidence and changes of mind. Nat. Commun. 16, 6998 (2025).
Xue, K., Zheng, Y., Rafiei, F. & Rahnev, D. The timing of confidence computations in human prefrontal cortex. Cortex 168, 167–175 (2023).
Kiani, R. & Shadlen, M. N. Representation of confidence associated with a decision by neurons in the parietal cortex. Science 324, 759–764 (2009).
Smith, J. D., Couchman, J. J. & Beran, M. J. The highs and lows of theoretical interpretation in animal-metacognition research. Philos. Trans. R. Soc. Lond B 367, 1297–1309 (2012).
Komura, Y., Nikkuni, A., Hirashima, N., Uetake, T. & Miyamoto, A. Responses of pulvinar neurons reflect a subject’s confidence in visual categorization. Nat. Neurosci. 16, 749–755 (2013).
Shadlen, M. N. & Kiani, R. Decision making as a window on cognition. Neuron 80, 791–806 (2013).
Roitman, J. D. & Shadlen, M. N. Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J. Neurosci. 22, 9475–9489 (2002).
Bondy, A. G., Haefner, R. M. & Cumming, B. G. Feedback determines the structure of correlated variability in primary visual cortex. Nat. Neurosci. 21, 598–606 (2018).
Henmon, V. A. C. The relation of the time of a judgement to its accuracy. Psychol. Rev. 18, 186–201 (1911).
Fetsch, C. R., Kiani, R. & Shadlen, M. N. Predicting the accuracy of a decision: a neural mechanism of confidence. Cold Spring Harb. Symp. Quant. Biol. 79, 185–197 (2014).
Kiani, R., Hanks, T. D. & Shadlen, M. N. Bounded integration in parietal cortex underlies decisions even when viewing duration is dictated by the environment. J. Neurosci. 28, 3017–3029 (2008).
Zylberberg, A., Barttfeld, P., Sigman, M. & Pereira, A. The construction of confidence in a perceptual decision. Front. Integr. Neurosci. 6, 79 (2012).
Adelson, E. H. & Bergen, J. R. Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A 2, 284–299 (1985).
Resulaj, A., Kiani, R., Wolpert, D. M. & Shadlen, M. N. Changes of mind in decision-making. Nature 461, 263–266 (2009).
Caspi, A., Beutter, B. R. & Eckstein, M. P. The time course of visual information accrual guiding eye movement decisions during visual search. Proc. Natl Acad. Sci. USA 4, 743 (2004).
Shekhar, M. & Rahnev, D. How do humans give confidence? A comprehensive comparison of process models of perceptual metacognition. J. Exp. Psychol. Gen. 153, 656–688 (2024).
Kepecs, A., Uchida, N., Zariwala, H. A. & Mainen, Z. F. Neural correlates, computation and behavioural impact of decision confidence. Nature 455, 227–231 (2008).
Rolls, E. T., Grabenhorst, F. & Deco, G. Decision-making, errors, and confidence in the brain. J. Neurophysiol. 104, 2359–2374 (2010).
Sanders, J. I., Hangya, B. & Kepecs, A. Signatures of a statistical computation in the human sense of confidence. Neuron 90, 499–506 (2016).
Rausch, M., Hellmann, S. & Zehetleitner, M. Confidence in masked orientation judgments is informed by both evidence and visibility. Atten. Percept. Psychophys. 80, 134–154 (2018).
Yartsev, M. M., Hanks, T. D., Yoon, A. M. & Brody, C. D. Causal contribution and dynamical encoding in the striatum during evidence accumulation. eLife 7, e34929 (2018).
O’Connell, R. G. & Kelly, S. P. Neurophysiology of human perceptual decision-making. Annu Rev. Neurosci. 44, 495–516 (2021).
Steinemann, N. et al. Direct observation of the neural computations underlying a single decision. eLife 12, RP90859 (2024).
Churchland, A. K. et al. Variance as a signature of neural computations during decision-making. Neuron 69, 818–831 (2011).
Kiani, R., Cueva, C. J., Reppas, J. B. & Newsome, W. T. Dynamics of neural population responses in prefrontal cortex indicate changes of mind on single trials. Curr. Biol. 24, 1542–1547 (2014).
Peixoto, D. et al. Decoding and perturbing decision states in real time. Nature 591, 604–609 (2019).
Lak, A. et al. Reinforcement biases subsequent perceptual decisions when confidence is low: a widespread behavioral phenomenon. eLife 9, e49834 (2020).
Pashler, H. Dual-task interference in simple tasks: data and theory. Psychol. Bull. 116, 220–244 (1994).
Sigman, M. & Dehaene, S. Parsing a cognitive task: a characterization of the mind’s bottleneck. PLoS Biol. 3, e37 (2005).
Kang, Y. H. R. et al. Multiple decisions about one object involve parallel sensory acquisition but time-multiplexed evidence incorporation. eLife 10, e63721 (2021).
Welford, A. T. The ‘psychological refractory period’ and the timing of high-speed performance—a review and a theory. Br. J. Psychol. 43, 2–18 (1952).
Hampton, R. R. Rhesus monkeys know when they remember. Proc. Natl Acad. Sci. USA 98, 5359–5362 (2001).
Middlebrooks, P. G. & Sommer, M. A. Neuronal correlates of metacognition in primate frontal cortex. Neuron 75, 517–530 (2012).
Vickers, D. Decision Processes in Visual Perception (Academic Press, 1979).
Luce, R. D. Response Times: Their Role in Inferring Elementary Mental Organization (Oxford University Press, 1986).
Baranski, J. V. & Petrusic, W. M. Probing the locus of confidence judgments: experiments on the time to determine confidence. J. Exp. Psychol. Hum. Percept. Perform. 24, 929–945 (1998).
Desender, K., Ridderinkhof, K. R. & Murphy, P. R. Understanding neural signals of post-decisional performance monitoring: an integrative review. eLife 10, e67556 (2021).
Pouget, A., Drugowitsch, J. & Kepecs, A. Confidence and certainty: distinct probabilistic quantities for different goals. Nat. Neurosci. 19, 366–374 (2016).
Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N. & Pouget, A. The cost of accumulating evidence in perceptual decision making. J. Neurosci. 32, 3612–3628 (2012).
Song, H. F., Yang, G. R. & Wang, X. J. Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife 6, e21492 (2017).
Shadmehr, R., Reppert, T. R., Summerside, E. M., Yoon, T. & Ahmed, A. A. Movement vigor as a reflection of subjective economic utility. Trends Neurosci. 42, 323–336 (2019).
Drugowitsch, J., Moreno-Bote, R. & Pouget, A. Optimal decision-making with time-varying evidence reliability. in Advances in Neural Information Processing Systems 27 (eds Ghahramani, Z. et al.) 748–756 (Curran Associates, 2014).
Subramanian, D., Alers, A. & Sommer, M. A. Corollary discharge for action and cognition. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 4, 782–790 (2019).
Law, C. T. & Gold, J. I. Neural correlates of perceptual learning in a sensory-motor, but not a sensory, cortical area. Nat. Neurosci. 11, 505–513 (2008).
Boundy-Singer, Z. M., Ziemba, C. M. & Goris, R. L. T. Confidence reflects a noisy decision reliability estimate. Nat. Hum. Behav. 7, 142–154 (2022).
Mamassian, P. & de Gardelle, V. Modeling perceptual confidence and the confidence forced-choice paradigm. Psychol. Rev. 129, 976–998 (2022).
Zylberberg, A. & Shadlen, M. N. A population representation of the confidence in a decision in the parietal cortex. Cell Rep. 44, 115526 (2025).
So, N. Y. & Shadlen, M. N. Decision formation in parietal cortex transcends a fixed frame of reference. Neuron 110, 3206–3215 (2022).
Meister, M. L. R., Hennig, J. A. & Huk, A. C. Signal multiplexing and single-neuron computations in lateral intraparietal area during decision-making. J. Neurosci. 33, 2254–2267 (2013).
Cisek, P. Cortical mechanisms of action selection: the affordance competition hypothesis. Philos. Trans. R. Soc. Lond. B 362, 1585–1599 (2007).
Shadlen, M. N., Kiani, R., Hanks, T. D. & Churchland, A. K. Neurobiology of decision making: an intentional framework. in Better Than Conscious? Decision Making, the Human Mind, and Implications For Institutions (eds Engel, C. & Singer, W.). 71–101 (MIT Press, 2008).
Selen, L. P. J., Shadlen, M. N. & Wolpert, D. M. Deliberation in the motor system: reflex gains track evolving evidence leading to a decision. J. Neurosci. 32, 2276–2286 (2012).
Zylberberg, A., Ouellette, B., Sigman, M. & Roelfsema, P. R. Decision making during the psychological refractory period. Curr. Biol. 22, 1795–1799 (2012).
Hénaff, O. J., Boundy-Singer, Z. M., Meding, K., Ziemba, C. M. & Goris, R. L. T. Representation of visual uncertainty through neural gain variability. Nat. Commun. 11, 2513 (2020).
Walker, E. Y., Cotton, R. J., Ma, W. J. & Tolias, A. S. A neural basis of probabilistic computation in visual cortex. Nat. Neurosci. 23, 122–129 (2020).
Geurts, L. S., Cooke, J. R. H., van Bergen, R. S. & Jehee, J. F. M. Subjective confidence reflects representation of Bayesian probability in cortex. Nat. Hum. Behav. 6, 294–305 (2022).
Boundy-Singer, Z. M., Ziemba, C. M. & Goris, R. L. T. Sensory population activity reveals downstream confidence computations in the primate visual system. Proc. Natl Acad. Sci. USA 122, e2426441122 (2025).
Lee, T. S. & Mumford, D. Hierarchical Bayesian inference in the visual cortex. J. Opt. Soc. Am. A 20, 1434 (2003).
Haefner, R. M., Berkes, P. & Fiser, J. Perceptual decision-making as probabilistic inference by neural sampling. Neuron 90, 649–660 (2016).
Eastman, K. M. & Huk, A. C. PLDAPS: a hardware architecture and software toolbox for neurophysiology requiring complex visual stimuli and online behavioral control. Front. Neuroinform. 6, 1 (2012).
Brainard, D. H. The psychophysics toolbox. Spat. Vis. 10, 433–436 (1997).
Siegle, J. H. et al. Open Ephys: an open-source, plugin-based platform for multichannel electrophysiology. J. Neural Eng. 14, 045003 (2017).
Stringer, C., Pachitariu, M., Steinmetz, N., Carandini, M. & Harris, K. D. High-dimensional geometry of population responses in visual cortex. Nature 571, 361–365 (2019).
Lewis, J. W. & Van Essen, D. C. Corticocortical connections of visual, sensorimotor, and multimodal processing areas in the parietal lobe of the macaque monkey. J. Comp. Neurol. 137, 112–137 (2000).
Gnadt, J. W. & Andersen, R. A. Memory related motor planning activity in posterior parietal cortex of macaque. Exp. Brain Res. 70, 216–220 (1988).
Nguyenkim, J. D. & DeAngelis, G. C. Disparity-based coding of three-dimensional surface orientation by macaque middle temporal neurons. J. Neurosci. 23, 7117–7128 (2003).
Shan, H., Moreno-Bote, R. & Drugowitsch, J. Family of closed-form solutions for two-dimensional correlated diffusion processes. Phys. Rev. E 100, 32132 (2019).
Acknowledgements
The authors are grateful to the members of the Fetsch lab for their insight and discussions, and to R. Kiani (NYU) for providing code and inspiration. In addition, the authors thank O. Garalde, J. Killebrew, B. Nash and B. Quinlan for technical assistance. This work was supported by the National Institute of Neurological Disorders and Stroke (RF1NS132910 to C.R.F.), the E. Matilda Ziegler Foundation for the Blind (to C.R.F.) and Whitehall Foundation Research Grant (2021-05-112 to C.R.F.). C.R.F. is also supported by the France-Merrick Foundation.
Author information
Authors and Affiliations
Contributions
M.V.-L. and C.R.F. conceived and designed the study, and drafted and edited the paper. M.V.-L. wrote the model-fitting and analysis code, and collected and analyzed the data.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Neuroscience thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Session-by-session sensitivity and amplitude parameters from fitting logistic and Gaussian functions to choice and RT, respectively.
Data points show individual behavioral sessions (N = 216), comparing the sensitivity (top) or RT function amplitude (bottom) for high-confidence (high-bet) vs. low-confidence (low-bet) trials. The vast majority of individual sessions showed greater sensitivity (accuracy) and faster RT amplitude when the monkey bet high vs. low. Left = monkey H; right = monkey G.
Extended Data Fig. 2 Detailed relationships between choice accuracy, RT, and confidence.
a, Accuracy as a function of RT quantile, split by motion strength (% coh); error bars represent ±s.e.m. centered around the mean. Accuracy decreased as a function of RT for every individual motion strength except 0% in both monkeys and 51.2% in monkey G (monkey H, from lowest to highest motion str.: one-sided Cochran-Armitage test, Bonferroni-corrected α level = 0.0083; P = (0.1974, 0.0014, 8.2096 × 10−6, 1.7365 × 10−5, 0.0019, 0.0008), T = (−0.8991, −4.8591, −10.4087, −10.9343, −6.0227, −5.0061), N = (15761, 14838, 15626, 15827, 10944, 17028); monkey G: P = (0.0240, 0.0065, 0.0005, 1.0224 × 10−5, 0.0002, 0.0095), T = (2.2883, −3.931, −6.0409, −11.9857, −7.1641, −4.6287), N = (15761, 14838, 15626, 15827, 10944, 17028)). b, Confidence (proportion of trials with a high bet) decreased as function of RT. Error bars represent ±s.e.m. centered around the mean. Colors and order of statistical values as in a (monkey H: one-sided Cochran-Armitage test, P = (1.0987 × 10−4, 3.2480 × 10−5, 4.4356 × 10−6, 0.0019, 0.0054, 0.0234), T = (−6.9574, −9.8006, −11.4179, −3.8537, −4.5103, −2.8434), N = (17028, 12760, 11970, 13296, 10708, 7695); monkey G: P = (1.6467 × 10−10, 1.2407 × 10−8, 1.1916 × 10−8, 1.4939 × 10−8, 1.4052 × 10−7, 0.1305), T = (−36.7529, −37.2921, −37.5446, −36.1511, −18.9731, −1.3814), N = (13769, 15761, 14838, 15626, 15827, 10944)). c, Accuracy as a function of motion energy. Colors represent RT quintiles. Significant increases in accuracy were observed across all RT quintiles for both monkeys (interaction term between motion energy and RT quintile using ANCOVA, monkey H: P = 3.0838 × 10−12, F(4, 84131) = 14.9735, N = 84141; monkey G: P = 9.2056 × 10−15, F(4, 117391) = 17.9693, N = 117401). Error bars represent +/− s.e.m. centered around the mean. d, Confidence as a function of motion energy. Colors same as in c (interaction term between motion energy and RT quintile using ANCOVA, monkey H: P = 3.2351 × 10−04, F(4, 70175) = 5.24, N = 70185; monkey G: P = 0.0056, F(4, 94158) = 3.65, N = 94168).
Extended Data Fig. 3 Serial, parallel, and hybrid model fits on unconditioned data.
a–c, Parallel model fitted to choice, RT, and wager frequencies as a function of motion strength (% coh). d–f, Serial model, format same as a–c. g–i, Hybrid model, format same as a–c. a–i correspond to monkey G. j–r, Same as a–i but for monkey H.
Extended Data Fig. 4 Variance of the conditional expectation (VarCE) estimated for the left-high and left-low preferring populations of neurons.
In both monkeys (H = top, G = bottom) VarCE begins to increase at approximately 0.2 s from motion onset, then decreases near saccade onset. Colors represent the two populations, and the shaded regions are standard errors calculated using a bootstrap method. Solid dots represent mean values. Linear regression slope vs. a null hypothesis of zero, from 0.1 to 0.5 s after motion onset: left-high neurons (monkey H: β (slope) = 1.7506 +/− 0.00636, P = 7.0789 × 10−118, t = 27.539, monkey G: β = −0.4717 +/− 0.1165, P = 5.3067 × 10−5, t = −4.0505; left-low neurons, monkey H: β = 1.2771 +/− 0.11562, P = 1.7098 × 10−26, t = 11.046, monkey G: β = 0.8509 +/− 0.1313, P = 1.1724 × 10−10, t = 6.4773). From −0.5 to −0.1 s from saccade initiation: left-high neurons (monkey H: β = −3.231 +/− 1.3879, P = 0.0201, t = −1.3879, monkey G: β = −3.7338 +/− 0.9079, P = 4.232 × 10−5, t = −4.1127; left-low neurons, monkey H: β = -3.4778 +/− 1.6636, P = 0.03698, t = −2.0906, monkey G: β = −7.4379 +/− 2.7085, P = 0.0061, t = −2.7461).
Extended Data Fig. 5 Correlation and pairwise difference between choice and wager decoder weights across the neural population.
a, Correlation between the magnitude of choice and wager decoder weights (red line) compared to the value expected by chance (shuffled data). b, Histogram of the difference between choice and wager decoder weights. Data are pooled from both monkeys.
Supplementary information
Supplementary Information
Supplementary Figs. 1–5.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Vivar-Lazo, M., Fetsch, C.R. Neural basis of concurrent deliberation toward a choice and confidence judgment. Nat Neurosci 29, 159–170 (2026). https://doi.org/10.1038/s41593-025-02116-9
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41593-025-02116-9








