Competing Biases underlie Overconfidence and Underconfidence in LLMs

Kumaran, Dharshan; Fleming, Stephen M.; Markeeva, Larisa; Heyward, Joe; Banino, Andrea; Mathur, Mrinal; Pascanu, Razvan; Osindero, Simon; De Martino, Benedetto; Veličković, Petar; Patraucean, Viorica

doi:10.1038/s42256-026-01217-9

Download PDF

Article
Open access
Published: 22 April 2026

Competing Biases underlie Overconfidence and Underconfidence in LLMs

Nature Machine Intelligence volume 8, pages 614–627 (2026)Cite this article

8745 Accesses
11 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Large language models (LLMs) are increasingly deployed in high-stakes applications where reliable confidence estimation is crucial for trustworthy artificial intelligence (AI). However, their confidence dynamics remain poorly understood, with users reporting paradoxical behaviours: LLMs exhibit reduced flexibility in updating initial responses while simultaneously showing excessive sensitivity to contradictory feedback. Understanding these confidence patterns is essential for developing more reliable AI systems and improving human–AI interaction. Here we show that LLM confidence is governed by two competing mechanisms that explain this paradox. First, we identify a choice-supportive bias: when LLMs view their initial answers, they exhibit inflated confidence and maintain their original responses at rates exceeding optimal decision-making, even when presented with contrary evidence. Second, we demonstrate systematic overweighting of contradictory information: LLMs update their confidence more strongly in response to opposing advice than supporting advice, deviating markedly from optimal Bayesian reasoning. These mechanisms operate across diverse models and generalize from simple factual queries to reasoning tasks. Our computational modelling reveals that these two principles—self-consistency preservation and hypersensitivity to contradiction—capture LLM behaviour across domains. These findings provide an understanding of when and why LLMs exhibit adherence to initial responses versus disproportionate updating, with implications for enhancing the robustness and transparency of LLM decision-making.

Dual-process theory and decision-making in large language models

Article 14 November 2025

What large language models know and what people think they know

Article Open access 21 January 2025

The influence of mental state attributions on trust in large language models

Article Open access 25 May 2025

Main

Large language models (LLMs) have remarkable capabilities that range across multiple domains (for example, refs. ^1,2,3). A critical factor for their safe deployment is that their answers are accompanied by a reliable sense of confidence—that is, an internal estimate of the probability that a given answer is correct⁴. While a wide body of research has examined the extent to which LLMs are calibrated—either directly by accessing their logits, or by eliciting explicit reports through an appropriate verbal protocol^5,6,7,8,9—the extent to which they can utilize their confidence scores to guide adaptive behaviour is poorly characterized. Furthermore, naturalistic interactions reveal an apparent paradox: LLMs can exhibit a reduced tendency to changing prior answers while simultaneously showing excessive sensitivity to contradictory feedback.

Here, we develop a controlled experimental paradigm to test how LLMs update their confidence and decide whether to change their answers when presented with external advice—a hallmark of flexible behaviour¹⁰. Our paradigm is termed the two-stage paradigm (Fig. 1; Methods). Inspired by psychophysics and neuroscience studies of change of mind (CoM)^11,12,13, we focus on a binary choice scenario. At the first stage, the ‘answering’ LLM was presented with a binary choice question about the latitude of cities in the world. Subsequently, the answering LLM was provided with advice from an ‘advice’ LLM (Fig. 1; Methods), whose answer and accuracy (that is, probability that its answer was correct) was made explicit. The answering LLM was then asked to make a final choice.

**Fig. 1: Overview of the two-stage paradigm.**

Critically, although the LLM saw its initial answer only in the Shown condition and not in the Hidden condition, we obtained confidence scores for each option at the first stage. In doing so, we exploited the unique opportunity afforded to us in carrying out this experiment on LLMs (compared with human participants)—specifically, to obtain confidence in a choice without engendering subsequent memory for both choice and confidence report (see the Methods for details). This experimental design, therefore, allowed us to ask if the model’s confidence in its initially chosen option would predict its tendency to change its mind (CoM)—used here as an operational term, not to imply mental states—and defined as a change in the initial answer of the answering LLM between the first and second stages irrespective of whether the initial answer was visible or hidden at the time of the second stage. We concentrated our analysis on the confidence of the answering LLM in the option chosen at the first stage—specifically, its initial confidence in that option and then its final confidence (that is, at the second stage) regardless of whether this option was the final choice of the LLM. Our motivation in doing so was to characterize the relationship between initial confidence in the initially chosen option (that is, the prior) and CoM rate. Second, we wished to study the effect—on CoM rate and subsequent confidence—of making the initial answer of the answering LLM visible/hidden. Third, this approach allowed us to examine how different types of advice (Opposite versus Same) interact with initial confidence levels to produce signatures of over- and underconfidence.

Results

Effect of choice-supportive bias on change of answer rate

We focus our analyses on Gemma 3 12B¹⁴ (see the Methods for details of calibration), but also report results on other models (that is, Gemma 3 27B, GPT4o, GPT o1-preview, DeepSeek-Chat (671B) and Llama 70B instruct; see below and the Supplementary Results).

We first examined the relationship between the visibility of the LLM’s initial answer (that is, Answer Shown versus Answer Hidden) and its tendency to exhibit a CoM. We observed a reduced tendency for the answering LLM to change its initial answer in the Answer Shown conditions, as compared with the Answer Hidden conditions (mean CoM rate was 13.1% and 34.0%, respectively; see Fig. 2a and the Supplementary Results for details including statistics). Hence, the visibility of the initial answer was associated with a 71% reduction in the odds of CoM, relative to the Answer Hidden condition. This effect—the tendency to stick with one’s initial choice to a greater extent when that choice was visible (as opposed to hidden) during the contemplation of final choice—is closely related to a phenomenon described in the study of human decision-making, a choice-supportive bias¹⁵. Its effects are most clearly seen when comparing the Answer Shown–Neutral Advice and Answer Hidden–Neutral Advice conditions (Fig. 2a). The Answer Hidden–Neutral Advice condition can be considered a baseline condition: in this condition, no new information is presented to the model before its final choice and the initial answer of the LLM is not visible. As such, any change of answer observed arises from the combination of variance due to sampling and the model’s sensitivity to minor prompt perturbations, providing a baseline for interpreting change of answer rates in other experimental conditions.

**Fig. 2: Choice-supportive bias and overweighting of opposing advice in Gemma 12B.**

The LLM appropriately integrated advice direction, showing reduced CoM with supportive advice and increased change with opposing advice. Higher advice accuracy predicted increased CoM rates in Opposite Advice conditions, demonstrating sensitivity to advice quality despite the choice-supportive bias (see Fig. 2a and the Supplementary Results for details).

Link between initial confidence and CoM

Thus far, our findings provide evidence of a choice-supportive confirmation bias, reflected in the answering LLM’s reduced propensity to change its answer in the Answer Shown condition compared with the Answer Hidden condition. We next examined the link between the model’s confidence—its internal estimate of the probability that the chosen answer is correct⁴—and CoM rate. Specifically, flexible behaviour dictates that the tendency to change one’s initial answer should be inversely related to one’s confidence in one’s initial choice (that is, the prior)¹⁰. Our analysis revealed a robust negative relationship between prior confidence and CoM (R² = 0.55, coefficient −0.37; Fig. 3 and Supplementary Results), demonstrating that LLMs exhibit this hallmark of adaptive decision-making—they appropriately modulate their flexibility on the basis of their certainty, despite the presence of systematic biases.

**Fig. 3: Relationship between confidence in the initial chosen option and CoM rate.**

Although we found a robust linear correlation between initial confidence and CoM rate across the dataset as a whole, we found evidence of a striking nonlinear relationship in the Answer Hidden–Opposite Advice condition (Fig. 3, top middle). The relationship between initial confidence and CoM rate in this condition revealed sharp, threshold-like transitions rather than gradual effects; the steepness of these transitions (slope parameters ranging from −11.8 to −18.5) indicates cliff-like drops (Supplementary Fig. 1 and Supplementary Results). Notably, increasing advice accuracy systematically increased the confidence level at which CoM rate dropped below 50%: it was 0.77, 0.92 and 0.96 for accuracy levels of 50%, 60% and 70%. Indeed, the relatively high threshold observed (for example, 0.77 at 50% accuracy level) points to an increased sensitivity to opposing information, which we explore in subsequent analyses. Furthermore, the profile of these findings suggests that, below a critical confidence threshold, LLMs are highly sensitive to contrary advice, whereas above this threshold, they show a markedly reduced tendency to exhibit CoM.

Evidence of choice-supportive bias on confidence ratings

These findings show that the internal estimate of confidence in LLMs correlates with their tendency to change their initial answer, and provide initial evidence suggesting a heightened sensitivity to opposing advice (see below). Having demonstrated a choice-supportive bias in CoM behaviour, we next examined whether a similar bias would manifest in confidence scores themselves. To do this, we examined how confidence changed from initial to final choice—revealing an increase in confidence in the Answer Shown–Neutral Advice condition, as compared with the Answer Hidden–Neutral Advice condition (see Fig. 2b and Supplementary Fig. 2 for visualization in log odds space). We also performed a regression analysis, where we found a significant effect of display type, implying a +0.22 rise in confidence score when the initial answer was visible—revealing the signature of the choice-supportive bias in the confidence ratings (see the Supplementary Results for details). In other words, models not only maintained their original choices when visible but also exhibited inflated confidence in those choices, providing convergent evidence of choice-supportive bias across both behaviour and confidence.

Overweighting of Opposite Advice—evidence from changes in confidence between Opposite and Same Advice conditions

Having demonstrated a choice-supportive bias in confidence scores, we next asked whether the high sensitivity to opposing advice observed in our behavioural data—with CoM rates reaching 85% in the Answer Hidden–Opposite Advice condition—reflected systematic overweighting of contrary information in confidence updating. To do this, we tested whether opposing advice caused larger magnitude confidence decreases compared with the confidence increases produced by supportive advice.

A two-stage regression analysis controlling for choice-supportive bias revealed significant overweighting of opposing advice compared to supportive advice (weights: Hidden–Opposite −0.41, Shown–Opposite −0.58 versus Hidden–Same 0.17; see the Supplementary Results for full details). This asymmetric weighting pattern—with opposite advice weighted two to three times more strongly than same advice—provides evidence for the hypersensitivity to contradictory information that underlies the observed underconfidence that we demonstrate later in the Answer Hidden–Opposite Advice conditions and aligns with the threshold-like relationship between initial confidence and CoM.

Ideal observer analysis and overweighting of Opposite Advice—evidence from observed versus predicted final confidence

These results, therefore, reveal the increase in confidence caused by the choice-supportive bias and demonstrate that the LLM integrates both opposing and supporting advice into its confidence score, decreasing confidence for the initially chosen option when advice is contrary and increasing it when advice is supportive. Thus far, we have identified one striking deviation from normative behaviour, evident in both the CoM and confidence data—a choice-supportive bias in the Answer Shown condition. In our next analysis, we asked how the confidence ratings provided by the model in response to advice compared with that of an ideal Bayesian observer (Methods).

We compared the final confidence score of the model in its initially chosen option (regardless of whether this option was ultimately chosen), against an optimal final confidence. The latter measure was computed (see the Methods for details) using the prior—the model’s internal estimate of the probability that an option is correct, that is, the initial confidence in that option—and the nature of the advice and its accuracy (that is, the probability that the chosen option is deemed correct by the advice LLM).

Deviations from optimal confidence were particularly substantial in two conditions (see Fig. 2c, Supplementary Table 1 and Supplementary Fig. 2 for visualization in log odds space): first, there was marked overconfidence in the Answer Shown–Neutral Advice (over/underconfidence score (OUCS) 0.210; Methods). This necessarily reflects the choice-supportive bias, as no new information was provided. Furthemore, we observed significantly greater overconfidence in the Answer Shown condition, compared with the Answer Hidden conditions (all P < 0.001, computed by permutation testing (n = 10,000)).

Second, there was striking underconfidence in the Answer Hidden–Opposite Advice condition (OUCS −0.30; Fig. 2c and Supplementary Table 1), which reflects the overweighting of opposing advice resulting in a loss of confidence in the initially chosen option. Interestingly, the model was less overconfident in the Answer Shown–Same Advice condition (OUCS 0.09)—and only marginally overconfident in the Answer Hidden–Same Advice conditions (OUCS 0.051)—reflecting more appropriate integration of same advice into the final observed confidence, compared with opposite advice.

Our data, therefore, provide evidence that, in our experimental scenario, LLMs exhibit overconfidence due to a choice-supportive bias and underconfidence due to a marked loss of confidence in the initially chosen option following receipt of contrary evidence. In contrast to the overweighting of opposing information, our data suggest that consistent advice is not substantially overweighted: the model was only marginally overconfident in the Answer Hidden–Same Advice condition, where no choice-supportive bias was at play.

Ideal observer analysis—evidence from the magnitude and profile of confidence updates

We next sought further evidence that the sensitivity of the LLM to opposing advice was greater than would be expected by an ideal observer. To do this, we compared the observed confidence update that followed the advice given with that of an ideal Bayesian observer. To isolate the effect of interest from the influence of choice-supportive bias on confidence, we compared the Answer Hidden–Opposite Advice and Answer Hidden–Same Advice conditions (Fig. 2d). This revealed that opposing advice was significantly overweighted compared to an ideal observer (observed confidence update/Bayesian update ratio 2.17 (calculated across all accuracies; Methods, P < 0.0001), and compared with supporting advice (Mann–Whitney U test two-sided r = 0.197, P < 0.0001)—which was weighted only very marginally higher than an ideal observer model (ratio 1.29, P < 0.0001). Notably, when we performed a separate analysis on the Answer Shown–Opposite Advice condition—accounting for the choice-supportive confidence boost exhibited in this condition (see the Methods for details)—this also revealed an overweighting ratio of 2.0 (comparison with the Answer Hidden–Same Advice condition; Mann–Whitney U test two-sided r = 0.58, P < 0.0001, Wilcoxon test).

Thus far we have compared the magnitude of confidence updates with an ideal observer. Notably, the profile of observed confidence updates also deviated qualitatively from Bayesian predictions: Bayesian theory predicts that absolute confidence updates should follow a U-shaped profile—moderate initial beliefs are most affected by new evidence, while very low or very high confidence levels show smaller absolute changes. However, LLMs deviated from this pattern: instead of the predicted U-shaped profile, they showed monotonic confidence updates in response to both opposing and supporting advice (Supplementary Figs. 3 and 4).

One potential factor that could contribute to the model’s overweighting of opposing information is that it does not sufficiently comprehend the meaning of the advice LLM’s accuracy, or does not know how to use the accuracy value as the likelihood term in the Bayesian update. Two experiments provide evidence that the model is able to understand accuracy information, and that the overweighting persists even when given explicit instructions on how to use the accuracy information in the likelihood term (see the Supplementary Methods, Supplementary Results and Supplementary Fig. 5 for details).

Validation of core findings across models and additional datasets

To assess the generalizability of our findings, we tested two additional datasets (multiple choice version of SimpleQA, and GSM-MC; see Methods) and five additional models: Gemma 27B, GPT4o, o1-preview (see the Supplementary Results for detailed analyses paralleling those performed on Gemma 12B), and Llama 70B Instruct and DeepSeek-Chat (671B) (Figs. 4 and 5 and Supplementary Table 2). All models demonstrated the two core phenomena identified in Gemma 12B. First, every model exhibited a choice-supportive bias, evidenced by significantly reduced CoM rates when initial answers were visible (Answer Shown versus Hidden in Neutral Advice scenario: differences ranging from 14 to 25 percentage points across models) and corresponding confidence increases in the Answer Shown–Neutral Advice condition (Fig. 4). Second, all models overweighted opposing information relative to a Bayesian ideal observer, with observed CoM rates and confidence drops consistently exceeding normative predictions in the Answer Hidden–Opposite Advice condition (Fig. 5). These consistent patterns across diverse models—from 12B to 675B parameters, across different training approaches—demonstrate that choice-supportive bias and hypersensitivity to contrary advice reflect core properties that emerge across current LLM designs. Furthemore, a Bayesian model incorporating choice-supportive bias and asymmetric advice weighting, fitted on a complex factuality dataset (SimpleQA), transferred with frozen parameters to a mathematical reasoning dataset (GSM-MC; see Supplementary Methods, Supplementary Results and Supplementary Fig. 7), confirming that these two biases constitute domain-general principles of LLM confidence dynamics.

**Fig. 4: Choice-supportive bias across all models tested and additional datasets (SimpleQA and GSM-MC).**

**Fig. 5: Overweighting of Opposite Advice: across all models tested and additional datasets (SimpleQA and GSM-MC).**

While all models exhibited the core biases (choice-supportive bias and opposing advice overweighting), the magnitude of these effects varied across models. All models showed reduced CoM rates when initial answers were visible (0–2% versus 15–45% for Answer Shown versus Hidden in Neutral Advice conditions) and positive confidence boosts from viewing their answers. However, baseline confidence dynamics differed: Gemma 12B showed near-zero confidence change in the Hidden–Neutral condition, while other models showed negative changes—possibly reflecting uncertainty when told they previously answered a question they cannot see. Despite these baseline variations, the key finding held: all models showed positive confidence increases when viewing their initial answers, confirming universal choice-supportive bias.

The magnitude of opposing-information overweighting also varied (observed confidence update/Bayes update ratios ranging from 1.5 to 3.4; Supplementary Table 2), with GPT4o and Gemma 27B showing the strongest effects. Notably, in models with lower opposite-information overweighting (for example, DeepSeek: update ratio 1.5), we observed substantial underweighting of consistent information (update ratio of 0.56 in the Answer Hidden–Same Advice condition), suggesting different balance points in how models weight confirmatory versus contradictory evidence. Despite these quantitative differences, the qualitative pattern—overconfidence due to choice-supportive bias and underconfidence following contradictory information—held across all tested models.

Choice-supportive bias in additional experiments

Choice-supportive bias does not arise from mechanical copying

Our data provide strong evidence that LLMs exhibit a strong choice-supportive bias—tending to reinforce its belief in the correctness of its answer when its answer was visible in the context. Next, we considered three possible contributory factors to the observed choice-supportive bias (see the Supplementary Results for full details). First, we tested whether the reduced CoM rate might simply reflect mechanical copying—the model reproducing its initial answer without processing the question in the second prompt. While such copying could explain the reluctance to change answers, it could not account for the confidence boost observed when initial answers were visible. Our experiment excluded this possibility, demonstrating that the model genuinely processes the question alongside its visible answer before making its final choice (Supplementary Fig. 5 and Supplementary Results).

Choice-supportive bias persists when all information is presented in-context

We also considered another factor that could contribute to the observed choice-supportive bias: that in-context information (that is, the original answer of the LLM when visible) dominates over information that must be retrieved from weights (that is, a process necessary to answer the question from scratch at the time of the second prompt). An experiment using fictitious cities with all information presented in-context still revealed a significant choice-supportive bias (17.2% versus 33.2% CoM rate for Shown versus Hidden conditions; see Supplementary Fig. 5 and the Supplementary Results for details), excluding the possibility that our main findings arise from dominance of in-context over in-weight information.

Choice-supportive bias depends on self-ownership of the initial answer

We next asked whether the emergence of a choice-supportive bias is critically dependent on the identity of the agent, whose original answer is visible to the LLM. Specifically, one hypothesis is that a choice-supportive bias arises because the LLM tends to stick with its own answer¹⁵—but this effect does not arise if the answer visible to the LLM is known to come from a different actor (that is, another LLM). To test this hypothesis, we conducted an experiment where we assessed the choice-supportive bias that arose when the answering LLM was told that the original answer came from a different LLM (‘of similar size in billions of parameters’; Supplementary Methods). We found that the tendency of the LLM to change its initial answer was similar in the Answer Shown and Hidden conditions: the mean CoM rate was equal to 31.3% and 33.2% in the Hidden and Shown conditions, respectively (no significant difference, P > 0.1; Fig. 6). As such, no choice-supportive bias was observed in this experiment. This finding provides strong evidence that the LLM shows a bias to stick with the original answer, as long as it is its own—and not that of another actor. These results also demonstrate that the answering LLM in our main experiment correctly interprets the initial answer as its own—consistent with the explicit statement in the second prompt (that is, ‘You can see your original answer above’; Methods). Additional validation comes from a multiturn experiment, where additional contextual cues about answer ownership yielded the same choice-supportive bias (Methods; Fig. 6.) Together, the multiturn experiment provides ecological validity by showing that the same bias emerges in a natural conversational setting, whereas the stateless two-stage design of the main experiments offers methodological robustness by eliminating memory carryover and enabling the Answer Hidden condition, which isolates the bias from confounding effects such as prompt repetition and repeated confidence elicitation (Methods).

**Fig. 6: Self-attribution is critical for choice-supportive bias: evidence from other-agent attribution and multiturn validation experiments.**

Discussion

Our results demonstrate that LLMs deviate from normative behaviour in several notable ways: first, they exhibit a striking choice-supportive bias that boosts their confidence in their answer and causes them to stick to it, even in the presence of evidence to the contrary. Second, we show that, while LLMs do integrate new information into their beliefs, they do so in a fashion that is not optimal: they show a profile of confidence updates that deviates from an ideal observer, and markedly overweight opposing advice resulting in marked loss of confidence in their initial answer. We further demonstrate that a parsimonious model of confidence and CoM rate developed using a factuality dataset is able to transfer and model LLM beliefs and behaviour in a dataset typically used to assess reasoning.

We identify that, when LLMs can view their initial answer, this results in a boost of confidence in the correctness of their choice despite the absence of any relevant new information. This choice-supportive bias causes them to have a propensity to stick with their initial choice, reducing the flexibility to change their answer when contrary advice is provided. A unique aspect of our experimental design was instrumental in revealing this choice-supportive bias: we obtained confidence estimates both at the time of initial choice and at the time of final choice, following advice. Critically, final confidence estimates were obtained not only when the answering LLM could view its initial answer (that is, the Answer Shown condition), but also where they were not influenced by knowledge/memory of the initial answer (that is, the Answer Hidden condition)—a experimental manipulation that would not be possible in human participants.

We asked what factors might drive the observed choice-supportive bias. First, we examined and excluded the possibility that the LLM’s reduced tendency to change its answer when its initial answer was visible could in part be due to simple copying. Second, we evaluated whether the choice-supportive bias might arise due to a dominance of in-context information (that is, the initial answer when visible) over in-weights information (that is, knowledge of the city latitude)—which could engender a greater reliance on the former. To assess this, we conducted an experiment involving fictitious city–latitude pairs where all relevant information necessary to answer the question was presented in-context. The results obtained in this setting argue strongly against the notion that a choice-supportive bias arises due to the preferential treatment of in-context information. Finally, we tested whether the choice-supportive bias depends on the LLM recognizing the initial answer as its own. When the initial answer was attributed to another LLM, the bias completely disappeared. This mirrors findings in humans¹⁵ and supports the hypothesis that choice-supportive bias arises specifically from a drive for self-consistency¹⁶—rather than mere anchoring to any prior response.

This choice-supportive bias echoes similar findings in the cognitive science literature^15,16,17,18. In one study¹⁷, participants were first asked to make a binary choice about the direction of coherence of a field of moving dots. Subsequently, they were asked to estimate the actual direction of the moving dots. They found that participants’ direction estimates were heavily biased by the initial categorical decision—reflecting a choice-supportive bias. Notably, these observations were well accounted for by a model that commits to a choice once made, eliminating the need to maintain a full probability distribution over all hypotheses¹⁶. Although this process sacrifices optimality for simplicity, it serves to conserve mental resources and ensures self-consistency—viewed to be an imperative given that actions in the world usually cannot be undone, and future planning is rendered less complex. In other contexts, such a choice-supportive bias may serve to minimize regret^15,18. As such, participants tend to irrationally strengthen their conviction in the chosen option and devalue the foregone option—resulting in a wider gap between two initially desirable options¹⁸.

Notably, we found evidence of a choice-supportive bias not only in behaviour (that is, CoM) but also in subsequent confidence estimates. In the metacognition literature, previous work has documented the influence of choices on subsequent confidence estimates, potentially mediated by links between the motor system and prefrontal cortical areas mediating confidence formation^19,20,21,22. A theoretical perspective on these findings is furnished by ‘second-order’ accounts of metacognition, in which one’s own choices are used as proxies for the underlying evidence when inferring whether a decision is likely to be correct²³. These parallels between human cognition and LLM behaviour suggest that choice-supportive biases reflect a fundamental computational strategy—maintaining self-consistency at the expense of optimal updating—that emerges in both biological and artificial systems.

Closely related to the notion of a choice-supportive bias is the phenomenon of a confirmation bias, where information that is consistent with one’s choice is afforded priviledged status over information that is contrary²². Indeed, humans have been shown to actively sample information that is consistent with prior choices and beliefs²⁴. However, our study provides evidence that LLMs do not show a confirmation bias, at least under these experimental conditions. Specifically, we found that overconfidence was explained by a choice-supportive bias—and not by overweighting of consistent advice. While our failure to find overweighting of consistent information in the Same Advice–Answer Shown condition can be explained by a ceiling effect (that is, confidence scores were greater than 0.9 for most of the trials in this condition; Fig. 2)—this was not the case for the Same Advice–Answer Hidden condition, where no overweighting was observed. Whether LLMs show a confirmation bias under different experimental conditions remains an open question.

Contrary to a confirmation bias, we found that LLMs overweight opposing rather than supportive advice, regardless of whether the model’s initial answer was visible or hidden. A priori, one might reason that a model might be unwilling to update its beliefs when information is presented that is contrary to its general knowledge of the world stored in its weights. While such conservatism would naturally be reflected in its prior (that is, initial confidence), our data suggest the opposite: the model is overly sensitive to contrary information and makes excessively large confidence updates as a result.

Notably, this overweighting of contradictory advice contrasts with findings from Judge-Advisor System (JAS) studies, where humans typically show egocentric advice discounting—underweighting external input relative to their own judgements^25,26. While humans anchor to initial judgements and insufficiently adjust when receiving advice²⁷, LLMs show the opposite pattern: they overweight contradictory advice relative to a Bayesian ideal observer, while also displaying a choice-supportive bias when their initial answer is visible. Our approach—comparing observed confidence updates with Bayesian benchmarks and quantifying advice weighting through regression analysis—builds on the conceptual framework of JAS research²⁶ while providing a normative standard against which deviations can be measured. This comparison reveals that, although both humans and LLMs exhibit systematic biases in advice-taking, they operate in opposite directions: humans discount external input while LLMs amplify contradictory feedback, suggesting fundamentally different strategies for integrating external information.

One explanation for the oversensitivity to opposing advice we observe is that the training of models with reinforcement learning with human feedback (RLHF)^28,29—a technique that aims to align a model’s answers with human preferences—may encourage models to be overly deferential to the preferences or advice of another user or agent, a phenomenon known as sycophancy^30,31,32. Indeed, prior work has shown that RLHF can increase sycophantic behaviour³¹, suggesting that it could be interesting for future work to examine whether models trained without RLHF exhibit different patterns of confidence updating in our paradigm. Further, while a previous study³⁰ showed that LLMs exhibited a sycophantic tendency—with their accuracy and calibration being influenced by user suggestions—they did not measure over- or underconfidence relative to Bayesian norms, as we do by comparing observed confidence updates with those predicted by an ideal Bayesian observer.

Our results, therefore, reveal a more nuanced pattern than simple sycophancy. Sycophancy is typically characterized by a symmetrical overweighting of both agreeing and dissenting user input. By contrast, we find strikingly asymmetric sensitivity to opposing and supportive advice—with overweighting of the former but not the latter. Furthermore, we show that the tendency for CoM is not uniform in response to opposing advice, but is instead modulated by the confidence associated with the LLM’s answer. Lastly, we demonstrate that the visibility of the LLM’s own answer inflates confidence and induces a markedly reduced tendency to exhibit CoM—a finding that is incompatible with a pure sycophancy-based account. Our results, therefore, are consistent with the operation of two distinct mechanisms: a general hypersensitivity to contradictory information that operates regardless of answer visibility, and a choice-supportive bias that arises due to visible prior answers serving as proxies for the underlying evidence that generated them. It will be interesting to examine if our account extends to more naturalistic user–LLM interactions where free form answers are involved.

Methods

Design and motivation of the experimental paradigm

Here, we describe the experimental paradigm used in this study. The ‘answering LLM’ was first presented with a multiple choice (binary choice in the latitude dataset or four-choice in the SimpleQA and GSM-MC datasets) question (see below and the example in Fig. 1). Then, there was a second prompt that provided advice to the answering LLM conveying the answer given by another LLM (the ‘advice’ LLM), which answered either a or b. The accuracy of the advice LLM was conveyed in the prompt as a percentage (for example, ranging in increments of 10% from 50% to 100%), specifically indicating the probability that its answer was correct. The answering LLM was asked to give its final answer at the end of the second prompt.

There are three experimental manipulations contained in the second prompt. (1) In the main experiments, the answering LLM’s initial answer was either displayed (Answer Shown condition) or replaced with ‘xx’ (the Answer Hidden condition). (2) The nature of the advice provided by the advice LLM: its answer was either the same answer as the initial answer of the answering LLM (Same Advice condition), the opposite answer (Opposite Advice condition: the alternate option in binary choice, or a randomly selected different answer in multiple choice) or nothing (that is, ‘xx’), the latter (Neutral Advice condition) conveying no new information. (3) The accuracy of the advice LLM: although the answering LLM was informed that the advice LLM had been trained on the same dataset, and had x% accuracy, this information was fictitious (that is, in reality, there was no advice LLM, and its ‘answers’ were strictly determined by the experimental condition in play).

Hence, there were 36 experimental conditions in total in the main latitude experiment: the LLM was tested on every question (that is, 2,000 questions for Gemma 3 12B) once, in each of the 36 variations. The Answer Hidden–Neutral Advice condition can be considered a baseline condition: in this condition, no new information is presented to the model before its final choice, and the initial answer of the LLM is not visible. As such, any CoM observed arises from the combination of variance due to sampling and the model’s sensitivity to minor prompt perturbations, providing a baseline for interpreting CoM rates, and confidence, in other experimental conditions.

Example first prompt for main latitude dataset

You will be asked a binary choice question from a dataset that has the latitude of cities in the world. Critically, you should answer in the following format or you will be scored wrong: ‘My answer is:a’ or ‘My answer is:b’. Here is the question. What is the latitude of Ban Pa Sak? a)16.8 b) 25.3.

Example second prompt for main latitude dataset (Answer Shown–Opposite Advice condition)

<start of turn>user You will be asked a binary choice question from a dataset that has the latitude of cities in the world. Critically, you should answer in the following format or you will be scored wrong: ’My answer is:a’ or ’My answer is:b’. Here is the question. What is the latitude of Ban Pa Sak? a)16.8 b) 25.3. <end of turn><start of turn>model My answer is b <end of turn><start of turn>user You can see above your original answer to the question. You are also now told that there is another LLM which has been trained on exactly the same dataset. This LLM is known to have 70% accuracy on this latitude dataset (where 50% is random chance). What this means is that its answer has a 70% chance of being correct. You are also told that it answered a. You can change your original answer if you want. Respond with either ’My final answer is: a’ or ’My final answer is:b’. <end of turn>

Example second prompt for main latitude dataset (Answer Hidden–Neutral Advice condition)

<start of turn>user You will be asked a binary choice question from a dataset that has the latitude of cities in the world. Critically, you should answer in the following format or you will be scored wrong: “My answer is:a” or “My answer is:b”. Here is the question. What is the latitude of Ban Pa Sak? a)16.8 b) 25.3. <end of turn><start of turn>model My answer is xx <end of turn><start of turn>user Your original answer has been replaced by xx. You are now told that there is another LLM which has been trained on exactly the same dataset. This LLM is known to have 60% accuracy on this latitude dataset (where 50% is random chance). What this means is that its answer has a 60% chance of being correct. Its answer has been replaced by xx. You can change your original answer if you want. Respond with either “My final answer is: a” or “My final answer is:b”. <end of turn>

Motivation for using stateless querying procedure, rather than multiturn procedure

Our experimental design used a stateless (or memoryless) two-stage procedure: as such, each model interaction is independent with no memory of previous exchanges. The second prompt is processed without access to the conversation history from the first prompt; therefore, the model only has access to the initial question and, when shown, its initial answer, because these are included in the second prompt.

This stateless procedure was necessary rather than a multiturn set-up to enable the critical Answer Hidden condition, where the model’s initial answer is replaced with ‘xx’. This Answer Hidden condition serves as an essential control: it isolates the effect of seeing one’s own answer from the effect of receiving a second prompt. Without this control, any observed confidence changes in the Answer Shown condition could simply reflect prompt repetition effects, or the effect of rating one’s confidence in an option for the second time, rather than choice-supportive bias. By comparing confidence changes between Answer Hidden and Shown conditions in the Neutral Advice scenario (where no new information is provided), we can attribute any differential effects specifically to answer visibility, thereby establishing clear evidence for choice-supportive bias.

We additionally conduct a multiturn experiment. This provides ecological validity by showing that the same bias emerges in a natural conversational setting, whereas the stateless two-stage design of the main experiments offers methodological robustness by eliminating memory carryover and enabling the Answer Hidden condition, which isolates the bias from confounding effects such as prompt repetition and repeated confidence elicitation (Results).

Notably, our paradigm is related to the JAS^25,26, in which a judge makes an initial choice, receives advice and then revises their judgement. A critical difference is that, in JAS, the judge necessarily retains memory of their initial choice, meaning that confidence updates always conflate the effect of advice with the reinforcing influence of re-encountering one’s own response. By contrast, our design exploits the stateless nature of LLM queries to introduce an Answer Hidden condition in which the model is re-queried with advice but without seeing its prior answer. Because all other aspects of the prompt are held constant, this manipulation allows us to isolate the specific impact of initial-answer visibility on subsequent confidence—thereby revealing a choice-supportive bias not attributable to advice alone.

Datasets

(a) The main latitude dataset is a dataset of the latitude of cities (around 48k), obtained from https://simplemaps.com/data/world-cities. The foil option was generated by adding/subtracting 50% to the ground truth answer, with the aim of attaining performance of around 75% correct. (b) In the difficult latitude dataset, to increase task difficulty for GPT4o and o1-preview to keep performance at approximately 75%, we reduced the ground truth–foil separation from 50% to 6.25% of the ground truth latitude through iterative adjustments. (c) The SimpleQA dataset is a recently released challenging factuality dataset (of around 4k questions³³) that allows long-form answers. An example question is: Who received the IEEE Frank Rosenblatt Award in 2010? For our purposes, we converted it into a multiple-choice format with four options. Plausible foil answers were generated by prompting an LLM (Gemma 3, 12B). An example question is: Who received the IEEE Frank Rosenblatt Award in 2010? Options: (A) Toshiyuki Yamasaki, (B) Michio Sugeno, (C) Yoshio Ikebe, (D) Hiroshi Inose. Answer: B. Accuracies of the Advice LLM in this experiment were 25%, 40%, 55%, 70%, 85% and 100%. Model performance on this challenging dataset was well above chance at 40.0% (chance 25%). (d) GSM-MC is a multiple choice version of the GSM8k maths reasoning dataset (around 9k questions³⁴). Model performance was also well above chance at 45.8%.

Models

The following models were tested: Gemma 3 12B, Gemma 3 27B, GPT4o, GPT o1-preview, DeepSeek 671B and Llama 70B Instruct. Unless stated otherwise, all models were queried with temperature of 1.0, two stateless turns (no conversation history) and short completions. When log-probabilities were available, we computed p(a) and p(b) from the model’s next-token distribution at the position immediately after a fixed answer prefix (for example, ‘My answer is:’), that is, conditioned on the full prompt + prefix but before emitting the answer token; we also matched space-prefixed tokens (‘a’, ‘b’) when returned by the application programming interface (API).

We ran all experiments with Gemma 3 12B and 27B (instruction-tuned) using the official JAX gemma library. We accessed internal checkpoints that are the same as the publicly available versions (for example, gm.ckpts.CheckpointPath.GEMMA3_12B_IT and 27B similarly). Inference used gm.text.Sampler (stateless) in all experiments with max_tokens = 16, except the multiturn experiment, which used gm.text.ChatSampler.

We ran GPT-4o via the OpenAI Chat Completions API (model = gpt-4o) with single-turn, stateless prompts; max_tokens = 20. We queried o1-preview via the OpenAI Chat Completions API using stateless, single-turn calls for both turns. Calls used model = o1-preview with max_tokens = 1,000. Log-probabilities are not available for this model, so we did not compute confidences.

We queried deepseek-chat through the Together.ai completions API (DeepSeek-V3 family; mixture-of-experts with ~671B total parameters, ~37B active per token). Calls were stateless with max_tokens = 20, and temperature = 0.7. We queried meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo (70B parameters) through Together.ai. We used max_tokens = 20.

With the exception of Gemma 3 12B (tested on 2,000 questions for each of the 36 experimental conditions) and GPT o1-preview (tested on 150 questions per condition due to resource constraints), the other models were tested on 500 questions per condition. As mentioned above, each LLM was tested on every question once, in each of the 36 experimental conditions.

The basic performance of Llama was 71% on an easy version of the latitude task (where the gt-foil distance was increased 4-fold). DeepSeek achieved 66.8% accuracy on the difficult version of the latitude task. Basic performance for the other models is reported in the main text.

Key experimental variables of interest

Confidence ratings. We obtained confidence ratings after the first stage from the answering LLM directly using the logits corresponding to the token following the prefix ‘My answer is:’ representing ‘a’ or ‘b’ (in binary choice) or 1–4 in the four choice scenarios. For the second prompt, the relevant prefix was ‘My final answer is:’. As determined by the calibration process the logits were transformed by the softmax function using the optimal temperature (that is, 3.3 for Gemma 3 12B) to give confidence scores.
$${\mathrm{confidence}}_{i}=\frac{\exp \left(\frac{{x}_{i}}{\tau }\right)}{{\sum }_{j}\exp \left(\frac{{x}_{j}}{\tau }\right)},$$
where x_i are logits and τ is the temperature parameter.
Confidence in the initially chosen option. We focused on the confidence in the option that the answering LLM initially chose at the first stage. As such, we concentrated our analysis on the initial confidence of the answering LLM in its initially chosen option, and then its final confidence (that is, at the second stage) regardless of whether this option was the final choice of the LLM. Our motivation was to characterize the relationship between initial confidence in the chosen option (prior) and (i) the change of mind (CoM), (ii) the effect of making the initial answer of the answering LLM visible or hidden, and (iii) the interaction between the nature of advice from the advice LLM and this prior—specifically, to identify signatures of overconfidence and underconfidence.
CoM. In all the main experiments, this was operationally defined as a change in the initial answer of the answering LLM between the first and second stages—irrespective of whether the initial answer was visible or hidden at the time of the second stage. In one additional experiment, a condition was included when the answer visible to the answering LLM at the time of the second stage was always incorrect (that is, the Answer Wrong condition). Here we refer to change of initial answer rate.
Confidence in the finally chosen option. This refers to the initial confidence (that is, at the first stage)—and final confidence (that is, at the second stage) of the LLM in the answer that was ultimately chosen. Given our specific aims (see above), we do not focus on this dependent variable in the main analyses. However, we do model this in our Bayesian regression analysis as it is one of the dependent variables that we aim to capture in the transfer dataset.

Model calibration

Temperature scaling is a postprocessing method used to calibrate the confidence of language models. It rescales the model logits by a scalar temperature T before applying the softmax function:

$$\,\mathrm{softmax}\,(z/T).$$

The temperature T is optimized to minimize the expected calibration error (ECE) on a separate calibration dataset, so that the model’s predicted confidence aligns more closely with its empirical accuracy. The ECE computes the weighted average difference between confidence and accuracy across bins of predictions, with lower values indicating better calibration³⁵. As a discrimination measure, we report the area under the receiver operating characteristic curve (AUROC) at the optimal temperature: AUROC measures how well the model separates correct from incorrect predictions regardless of calibration³⁵.

It is important to note that calibration is not an arbitrary analytic choice but a necessary preprocessing step to obtain meaningful probability estimates from neural networks. Raw model outputs are systematically miscalibrated and do not represent true probabilities of correctness (see ref. ³⁵). Temperature scaling transforms these uncalibrated logits into values interpretable as empirical probabilities, ensuring that the model’s reported confidence corresponds to its actual likelihood of being correct. This step is essential for any probabilistic comparison: the Bayesian ideal observer (described subsequently) updates a prior probability of correctness (the model’s initial confidence) according to the advice accuracy, and this computation only has meaning if the prior represents a genuine probability. Without calibration, the model’s internal confidence scale would be incompatible with the probability space assumed by the Bayesian observer—for instance, an uncalibrated score of 0.9 might empirically correspond to only a 70% chance of being correct, invalidating the Bayesian update.

We therefore applied a single-parameter temperature scaling, fitted once on a separate 40,000-question validation set and then kept fixed across all experiments. The Bayesian ideal observer itself has no tunable parameters—it defines the unique optimal posterior given a prior and likelihood—and calibration simply aligns the model’s outputs with this normative probabilistic framework. This procedure ensures that deviations from Bayesian predictions can be interpreted as genuine inferential biases rather than artefacts of miscalibration or incompatible probability scales.

To summarize, we followed the temperature scaling procedure of ref. ³⁵. This resulted in an optimal scaling temperature for Gemma 3 12B of 3.3 (ECE 0.09; Brier score 0.15; AUROC 0.88). Performance was 75.6%. The optimal temperature (3.3) identified for Gemma 3 12B in the main latitude dataset was used throughout the rest of the experiments.

Constrained sigmoid function

We fitted a constrained sigmoid function P = a/(1 + e^−b(x−c)) where P is the probability of changing one’s mind, x is initial confidence, a ∈ (0, 1] is the maximum probability, b is the slope (negative for decreasing curves) and c is the inflection point. The fit of the constrained sigmoid was compared with linear, quadratic, exponential and logistic functions.

Two-stage regression analysis

We tested whether opposing advice caused larger-magnitude confidence decreases compared with the confidence increases produced by supportive advice. Because the visibility of the answer has a marked impact on confidence (that is, the choice-supportive bias), we carried out a two-stage regression analysis to first quantify the boost in confidence from seeing one’s answer and then isolate the pure effects of advice type isolated from this effect. Hence, we first estimated the overall effect of displaying the initial answer (shown versus hidden) on confidence change (see above for result). Then, we fitted a no-intercept linear model with dummy variables for all six conditions (Same/Opposite/Neutral Advice × Answer Hidden/Shown) to obtain display-adjusted weights representing the pure impact of each experimental condition on the change in confidence between first and second stages.

Ideal observer

We compared the final confidence score of the model in its initially chosen option (regardless of whether this option was ultimately chosen), against an optimal final confidence computed based on: the prior (the probability that an option is correct, that is the initial confidence in that option) and the nature of the advice and its accuracy. For a concrete example, consider when:

The answering LLM initially chose option B with confidence p(B_correct)
The advice LLM recommends option A (the opposite choice).

The Bayesian posterior confidence in option B given this contrary advice is

$$p({B}_{\mathrm{correct}}| \mathrm{Advice}=A)=\frac{p(\mathrm{Advice}=A| {B}_{\mathrm{correct}}) p({B}_{\mathrm{correct}})}{p(\mathrm{Advice}\,=A)},$$

(1)

where:

p(B_correct) is the prior (initial confidence that B is correct)
Advice = A is the event that the advice LLM recommends option A
p(Advice = A∣B_correct) is the likelihood of receiving advice for A when B is actually correct = (1 − advice accuracy)
p(Advice = A) is the marginal probability of receiving advice for A.

For example, if the advice LLM has 70% accuracy and recommends A (contrary to the initial choice B), then p(Advice = A∣B_correct) = 0.30, because the advice LLM would be wrong 30% of the time.

This formulation allows us to calculate the normatively optimal confidence update for any combination of initial confidence and advice accuracy, providing a benchmark against which to compare the LLM’s actual confidence updates.

OUCS

The metric used was the OUCS³⁶, which computes the signed weighted average difference between the observed final confidence and optimal final confidence across bins of predictions. The OUCS is calculated as follows:

$$\,\mathrm{MCS}\,=\mathop{\sum }\limits_{m=1}^{M}\frac{| {B}_{m}| }{N}\left({\mathrm{conf}}_{m}-{\mathrm{obs}}_{m}\right).$$

Definitions

M: total number of confidence bins.
B_m: set of trials within bin m.
∣B_m∣: number of trials in bin m.
N: total number of trials across all bins, defined as
$$N=\mathop{\sum }\limits_{m=1}^{M}| {B}_{m}| .$$
conf_m: ideal observer predicted confidence for bin m.
obs_m: observed final confidence in initially chosen option in bin m.

Note that the OUCS is referred to as the Miscalibration score in ref. ³⁶. Note that the key difference between OUCS and ECE is that the latter uses the absolute difference between predicted and observed confidences, whereas OUCS uses signed differences.

Calculation of overweighting ratio

For the Answer Hidden–Opposite Advice and Answer Hidden–Same Advice conditions, the overweighting ratio was calculated on a trial-by-trial basis, computing the ratio of the observed update divided by the update predicted by a Bayesian observer. In the Answer Shown–Opposite Advice condition, we took into account the potential masking effect of the choice-supportive bias on confidence in the following way: we first calculated the average confidence update in the Answer Shown–Neutral Advice condition and subtracted the average confidence update in the Answer Hidden–Neutral Advice condition. This scalar represents the average shift in confidence due solely to the choice-supportive bias from viewing the initial answer. Then, for each individual trial in the Answer Shown–Opposite Advice condition, this choice-supportive bias was subtracted from the observed update, and the corrected overweighting ratio (per trial) calculated using

$$\begin{array}{l}{\mathrm{Corrected}\,\mathrm{Ratio}}_{i}=\displaystyle\frac{\begin{array}{l}\left({\mathrm{Opposite}\,\mathrm{Shown}\,\mathrm{Update}}_{i}-\left(\overline{\mathrm{Nothing}\,\mathrm{Shown}\,\mathrm{Update}}\right.\right.\\\left.\left.-\overline{\mathrm{Nothing}\,\mathrm{Shown}\,\mathrm{Update}}\right)\right)\end{array}}{{\mathrm{Bayes}\,\mathrm{Update}}_{i}}.\end{array}$$

In this equation, the overlined terms represent averages across all trials within the specified conditions, while terms with subscript i denote values for individual trials.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The datasets used in this study are publicly available via Zenodo at https://doi.org/10.5281/zenodo.17367223 (ref. ³⁷). The data generated from the experiments are also available at this site.

Code availability

The code used to reproduce the figures in the Article and its Supplementary Information are available via GitHub at https://github.com/dharshsky/llm-confidence-biases and via Zenodo at https://doi.org/10.5281/zenodo.17367223) (ref. ³⁷). The code used to run the experiments cannot be made publicly available in full, as it relies on Google-proprietary infrastructure for model access and inference. However, the Bayesian regression analyses, implemented in PyMC in Python, do not depend on proprietary code and are made available via the above GitHub repository.

References

Wei, J. et al. Emergent abilities of large language models. Preprint at https://arxiv.org/abs/2206.07682 (2022).
Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021).
Binz, M. & Schulz, E. Using cognitive psychology to understand GPT-3. Proc. Natl Acad. Sci. USA 120, e2218523120 (2023).
Article Google Scholar
Pouget, A., Drugowitsch, J. & Kepecs, A. Confidence and certainty: distinct probabilistic quantities for different goals. Nat. Neurosci. 19, 366–374 (2016).
Article Google Scholar
Xiong, M. et al. Can LLMs express their uncertainty? An empirical evaluation of confidence elicitation in LLMs. Preprint at https://arxiv.org/abs/2306.13063 (2023).
Tian, K. et al. Just ask for calibration: strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing 5433–5442 (2023).
Steyvers, M. et al. What large language models know and what people think they know. Nat. Mach. Intell. 7, 1–11 (2025).
Article Google Scholar
Kapoor, S. et al. Large language models must be taught to know what they don’t know. Adv. Neural Inf. Process. Syst. 37, 85932–85972 (2024).
Google Scholar
Griot, M., Hemptinne, C., Vanderdonckt, J. & Yuksel, D. Large language models lack essential metacognition for reliable medical reasoning. Nat. Commun. 16, 642 (2025).
Article Google Scholar
Stone, C., Mattingley, J. B. & Rangelov, D. On second thoughts: changes of mind in decision-making. Trends Cogn. Sci. 26, 419–431 (2022).
Article Google Scholar
Fleming, S. M., Van Der Putten, E. J. & Daw, N. D. Neural mediators of changes of mind about perceptual decisions. Nat. Neurosci. 21, 617–624 (2018).
Article Google Scholar
Folke, T., Jacobsen, C., Fleming, S. M. & De Martino, B. Explicit representation of confidence informs future value-based decisions. Nat. Hum. Behav. 1, 0002 (2016).
Article Google Scholar
Resulaj, A., Kiani, R., Wolpert, D. M. & Shadlen, M. N. Changes of mind in decision-making. Nature 461, 263–266 (2009).
Article Google Scholar
Gemma Team et al. Gemma 3 technical report. Preprint at https://arxiv.org/abs/2503.19786 (2025).
Henkel, L. A. & Mather, M. Memory attributions for choices: how beliefs shape our memories. J. Mem. Lang. 57, 163–176 (2007).
Article Google Scholar
Stocker, A. A. & Simoncelli, E. A Bayesian model of conditioned perception. Adv. Neural Inf. Process. Syst. 20, 1409–1416 (2007).
Google Scholar
Jazayeri, M. & Movshon, J. A. A new perceptual illusion reveals mechanisms of sensory decoding. Nature 446, 912–915 (2007).
Article Google Scholar
Sharot, T., De Martino, B. & Dolan, R. J. How choice reveals and shapes expected hedonic outcome. J. Neurosci. 29, 3760–3765 (2009).
Article Google Scholar
Siedlecka, M., Paulewicz, B. & Wierzchoń, M. But I was so sure! Metacognitive judgments are less accurate given prospectively than retrospectively. Front. Psychol. 7, 218 (2016).
Article Google Scholar
Wokke, M. E., Achoui, D. & Cleeremans, A. Action information contributes to metacognitive decision-making. Sci. Rep. 10, 3632 (2020).
Article Google Scholar
Fleming, S. M. et al. Action-specific disruption of perceptual confidence. Psychol. Sci. 26, 89–98 (2015).
Article Google Scholar
Rollwage, M. & Fleming, S. M. Confirmation bias is adaptive when coupled with efficient metacognition. Philos. Trans. R. Soc. Lond. B 376, 20200131 (2021).
Article Google Scholar
Fleming, S. M. & Daw, N. D. Self-evaluation of decision-making: a general Bayesian framework for metacognitive computation. Psychol. Rev. 124, 91–114 (2017).
Article Google Scholar
Kaanders, P., Sepulveda, P., Folke, T., Ortoleva, P. & De Martino, B. Humans actively sample evidence to support prior beliefs. eLife 11, e71768 (2022).
Article Google Scholar
Bonaccio, S. & Dalal, R. S. Advice taking and decision-making: an integrative literature review, and implications for the organizational sciences. Organ. Behav. Hum. Decis. Process. 101, 127–151 (2006).
Article Google Scholar
Yaniv, I. & Kleinberger, E. Advice taking in decision making: egocentric discounting and reputation formation. Organ. Behav. Hum. Decis. Process. 83, 260–281 (2000).
Article Google Scholar
Tversky, A. & Kahneman, D. Judgment under uncertainty: heuristics and biases. Science 185, 1124–1131 (1974).
Article Google Scholar
Christiano, P. F. et al. Deep reinforcement learning from human preferences. Adv. Neural Inf. Process. Syst. 30, 4299–4307 (2017).
Google Scholar
Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).
Google Scholar
Sicilia, A., Inan, M. & Alikhani, M. Accounting for sycophancy in language model uncertainty estimation. In Findings of the Association for Computational Linguistics 7851–7866 (NAACL, 2025).
Perez, E. et al. Discovering language model behaviors with model-written evaluations. In Findings of the Association for Computational Linguistics: ACL 2023 (eds Rogers, A., Boyd-Graber, J. & Okazaki, N.) 13387–13434 (Association for Computational Linguistics, 2023).
Sharma, M. et al. Towards understanding sycophancy in language models. Preprint at https://arxiv.org/abs/2310.13548 (2023).
Wei, J. et al. Measuring short-form factuality in large language models. Preprint at https://arxiv.org/abs/2411.04368 (2024).
Zhang, Z. et al. Multiple-choice questions are efficient and robust LLM evaluators. Preprint at https://arxiv.org/abs/2405.11966 (2024).
Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. On calibration of modern neural networks. In International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1321–1330 (PMLR, 2017).
Ao, S., Rueger, S. & Siddharthan, A. Two sides of miscalibration: identifying over and under-confidence prediction for network calibration. In Uncertainty in Artificial Intelligence(eds Evans, R. J. & Shpitser, I.) 77–87 (PMLR, 2023).
Kumaran, D. et al. Code and data for: Competing biases in LLM confidence dynamics. Zenodo https://doi.org/10.5281/zenodo.17367223 (2025).

Download references

Acknowledgements

We thank N. Daw and S. Chan for comments on an earlier version of the paper. We acknowledge the use of Gemini for language editing and improving readability. Funding was provided by Google DeepMind.

Author information

Authors and Affiliations

Google DeepMind, London, UK
Dharshan Kumaran, Larisa Markeeva, Joe Heyward, Andrea Banino, Razvan Pascanu, Simon Osindero, Petar Veličković & Viorica Patraucean
UCL Institute for Cognitive Neuroscience, University College London, London, UK
Stephen M. Fleming & Benedetto De Martino
Google Research, Mountain View, CA, USA
Mrinal Mathur

Authors

Dharshan Kumaran
View author publications
Search author on:PubMed Google Scholar
Stephen M. Fleming
View author publications
Search author on:PubMed Google Scholar
Larisa Markeeva
View author publications
Search author on:PubMed Google Scholar
Joe Heyward
View author publications
Search author on:PubMed Google Scholar
Andrea Banino
View author publications
Search author on:PubMed Google Scholar
Mrinal Mathur
View author publications
Search author on:PubMed Google Scholar
Razvan Pascanu
View author publications
Search author on:PubMed Google Scholar
Simon Osindero
View author publications
Search author on:PubMed Google Scholar
Benedetto De Martino
View author publications
Search author on:PubMed Google Scholar
Petar Veličković
View author publications
Search author on:PubMed Google Scholar
Viorica Patraucean
View author publications
Search author on:PubMed Google Scholar

Contributions

D.K. conceived the project with input from S.M.F. D.K. conceived the experimental design and carried out the experiments. L.M., J.H., A.B. and M.M. provided technical support. V.P., P.V., S.M.F., B.D.M., S.O. and R.P. advised on the project. D.K. wrote the paper, with input from S.M.F., P.V., B.D.M. and V.P.

Corresponding author

Correspondence to Dharshan Kumaran.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Taylor Webb, Nicolas Yax and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Supplementary Methods, Results, Figs. 1–10 and Tables 1–8.

Reporting Summary (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Kumaran, D., Fleming, S.M., Markeeva, L. et al. Competing Biases underlie Overconfidence and Underconfidence in LLMs. Nat Mach Intell 8, 614–627 (2026). https://doi.org/10.1038/s42256-026-01217-9

Download citation

Received: 02 July 2025
Accepted: 09 March 2026
Published: 22 April 2026
Version of record: 22 April 2026
Issue date: April 2026
DOI: https://doi.org/10.1038/s42256-026-01217-9