Estimating self-performance when making complex decisions

Lu, Xiaping; Murawski, Carsten; Bossaerts, Peter; Suzuki, Shinsuke

doi:10.1038/s41598-025-87601-8

Download PDF

Article
Open access
Published: 25 January 2025

Estimating self-performance when making complex decisions

Scientific Reports volume 15, Article number: 3203 (2025) Cite this article

1819 Accesses
3 Citations
Metrics details

Subjects

Abstract

Metacognition, the ability to monitor and reflect on our own mental states, enables us to assess our performance at different levels – from confidence in individual decisions to overall self-performance estimates (SPEs). It plays a particularly important part in computationally complex decisions that require a high level of cognitive resources, as the allocation of such limited resources presumably is based on metacognitive evaluations. However, little is known about metacognition in complex decisions, in particular, how people construct SPEs. Here, we examined how SPEs are modulated by task difficulty and feedback in cognitively complex economic decision-making, with reference to simple perceptual decision-making. We found that, in both types of decision-making, participants’ objective performance was only affected by task difficulty but not by the presence of feedback. In complex economic decision-making, participants had lower SPEs in the absence of feedback (compared to the presence of feedback) in easy trials only but not in hard trials, while in simple perceptual decision-making, SPEs were lower in the absence of feedback in both easy and hard trials. Our findings suggest that people estimate their performance in complex economic decision-making through distinct metacognitive mechanisms for easy and hard instances.

Neural mechanisms of metacognitive improvement under speed pressure

Article Open access 12 February 2025

Joint contributions of metacognition and self-beliefs to uncertainty-guided checking behavior

Article Open access 24 September 2021

The integration of self-efficacy and response-efficacy in decision making

Article Open access 13 January 2025

Introduction

Metacognition – “thinking about thinking” – refers to the higher-order cognitive capability that enables us to internally evaluate our own cognitive processes associated with various tasks^1,2,3,4,5. Metacognitive abilities allow people to form confidence in the quality of their choices, even in the absence of immediate feedback^6,7,8. Decision confidence, elicited around the time of an individual choice, can be conceptualized as the subjective probability of success given the available evidence (i.e., Bayesian posterior probability of the chosen option being correct)^{9,10,11,12,13}. Previous studies in psychology and neuroscience have focused on the formation of confidence in perceptual^{6,14,15,16,17} and knowledge-based decisions¹³. Participants in these experiments were asked to indicate their confidence of the correctness of their judgment after each choice. Results showed that self-reported confidence was well captured by the posterior probability of the choice being correct^18,19,20. Abnormalities in decision confidence are known to be related to psychiatric disorders^21,22.

More recent studies in metacognition have examined whether and how human participants aggregate decision confidence in individual choices to form a “global” self-performance estimate (SPE). SPE is an overall belief about one’s skills and capacities to solve a given task. University students, for example, might integrate their performance in different assignments and exams to form an estimate of their overall academic abilities. It plays a key role in shaping our behaviours²³, selecting goals to achieve²⁴, and allocating time and resources²⁵. For instance, finding it hard to achieve some objective after multiple attempts, many people will be inclined to stop investing more time or effort in it and turn to pursue another, more achievable goal.

Rouault and colleagues assessed how SPEs in perceptual decisions are constructed across multiple decisions^18,19. They found that the feedback about the correctness of decisions affected SPEs significantly in a simple, perceptual decision-making task. Participants underestimated their performance (i.e., had lower SPEs) in the absence of feedback compared to the case with feedback, even though their objective performance in those decisions was not affected by the presence of feedback; that is, absence of feedback is associated with lower SPEs¹⁸. Computational modelling further suggests that participants formed the global SPEs by integrating decision confidence of each choice over multiple trials in a Bayesian manner¹⁸.

Unlike simple perceptual decisions, many decisions we make are computationally hard. Suppose you have to choose the set of items in a supermarket that maximise your satisfaction while sticking to your budget constraint. In a supermarket stocking just 100 items, assuming that you would only pick an item at most, you need to consider 2¹⁰⁰ combinations of items to choose the optimal set from them, a computationally intractable task. Another example is constructing investment portfolios by picking stocks listed on the market given a budget or risk constraint^26,27. There are currently more than 41,000 stocks listed on stock exchanges around the world²⁸, which makes building portfolios another intractable task. While this kind of decision is ubiquitous in life, which is reflected in modern choice theories (e.g., utility maximization²⁹ and satisficing³⁰), it requires individuals to solve intractable, that is, computationally complex, problems³¹.

The mathematical framework of computational complexity allows the quantification of computational hardness – or computational resource requirements – of tasks such as decisions³². Behavioral studies to date have shown that the level of computational complexity is negatively related to human performance in decisions – the higher the level of computational complexity, that is, the more computational resources are required to compute the optimal course of action, the lower the performance in those decisions, that is, the less likely people are to choose the optimal course of action³³. Importantly, the computational complexity of tasks, including decisions, can be quantified without reference to any particular algorithm – it is an objective attribute of a task. Furthermore, computational complexity can be quantified in a domain-general way. That is, the measures of computational complexity are not task-specific. It has been shown that such objective, task-independent measures of computational complexity predict both the extent of cognitive effort exerted on a decision as well as performance³⁴. However, at this point, little is known about ‘metacognition’ in complex decisions. That is, whether and how humans internally evaluate their own performance in computationally complex decisions remains elusive.

In the present study, we investigated the formation of SPEs in computationally complex economic decisions with reference to that in simple perceptual decisions. In line with previous studies on perceptual decision-making^18,19, we examined the effects of decision difficulty and the presence/absence of feedback on SPEs. To this end, we conducted a behavioral experiment in which participants solved both a simple, perceptual decision-making task (sample task; Fig. 1a)^18,35 as well as complex, economic decision-making task (knapsack task; Fig. 1b)^34,36. The latter is based on the 0-1 knapsack problem, a canonical (NP-complete) constrained optimization computational problem^34,37, in which task difficulty is manipulated by a theoretical measure of computational hardness (i.e., “instance complexity”^34,36). The knapsack task is closely related conceptually to many daily decisions (e.g., attention allocation, budgeting, task management). The experiment consisted of four different types of blocks (Fig. 1e): two difficulty levels (easy vs. hard) and two feedback conditions (presence vs. absence of feedback). In each block, participants performed 10 trials of a task (Fig. 1f). Crucially, at the end of each block, participants submitted a bid in a Becker–DeGroot–Marschak (BDM) auction³⁸ to purchase the right to complete additional 5 trials (Fig. 1c, f). Note that, in the additional 5 trials, no feedback was provided, and the difficulty level was the same as in the first 10 trials within the block. The auction procedure allowed us to measure a participant’s SPEs in an incentive-compatible manner (see Methods for details). If a participant thought that they could perform well in the additional trials, they should have bid high to secure the chance of attempting the additional trials. Participants were also asked to report their performance estimation on a Likert scale at the end of each block (Fig. 1d, f).

Results

Pilot experiment

We first conducted a pilot experiment (N = 17) to determine the sample size of the main experiment. In the sampling task, difficulty affected both objective performance (i.e., the proportion of correct choices; effect of difficulty: b = − 0.435 $\pm$ 0.066, P_corr < 0.001; see Fig. 2a and Table S1) and SPEs (i.e., bid value in the BDM auction; effect of difficulty: b = − 0.071 $\pm$ 0.013, P_corr < 0.001; see Fig. 2b and Table S2). As expected, they were lower in hard trials than in easy trials. Moreover, SPEs were lower in the no-feedback condition compared to the feedback condition (effect of feedback: b = 0.050 $\pm$ 0.013, P_corr < 0.001; see Fig. 2b and Table S2), although objective performance remained unchanged (effect of feedback: b = 0.070 $\pm$ 0.065, P = 0.284; see Fig. 2a and Table S1). We observed a similar pattern of SPEs when considering self-reported performance ratings instead of BDM auction bids (effect of difficulty: b = − 0.331 $\pm$ 0.065, P_corr < 0.001; effect of feedback: b = 0.360 $\pm$ 0.065, P_corr < 0.001; see Fig. 2c and Table S3). Overall, participant behavior in the sampling task was consistent with a previous study based on a similar task¹⁸.

In the knapsack task, objective performance was lower in hard trials compared to easy trials, but performance was not affected by the presence/absence of feedback (effect of difficulty: b = − 0.799 $\pm$ 0.074, P_corr < 0.001; effect of feedback: b = − 0.026 $\pm$ 0.074, P_corr = 0.724; see Fig. 2d and Table S4). SPEs (bid values) were lower in hard trials compared to easy trials (effect of difficulty: b = − 0.090 $\pm$ 0.013, P_corr < 0.001; see Fig. 2e and Table S5). They were also lower in the absence of feedback (compared to the presence of feedback) but only in easy trials (effect of feedback: b = 0.036 $\pm$ 0.013, P_corr = 0.024; interaction effect: b = − 0.036 $\pm$ 0.013, P_corr = 0.024; see Fig. 2e and Table S5), which is different to the pattern observed in the sampling task (Fig. 2b). The same pattern is present in self-reported performance ratings (effect of difficulty: b = − 0.463 $\pm$ 0.071, P_corr < 0.001; effect of feedback: b = 0.257 $\pm$ 0.071, P_corr < 0.001; interaction effect: b = − 0.199 $\pm$ 0.071, P_corr = 0.018; see Fig. 2f and Table S6).

We then determined the sample size in the main experiment to ensure a statistical power of at least 0.8 for the effects of interest (N = 36) (see Sect “Methods” for details).

Main experiment: objective performance

We first examined how objective performance in decisions was affected by difficulty and feedback about correctness of a decision. In both the sampling and the knapsack task, objective performance was lower in hard trials compared to easy trials (Fig. 3; effect of difficulty in the sampling task: b = − 0.582 $\pm$ 0.048, P_corr < 0.001; knapsack task: b = − 0.861 $\pm$ 0.055, P_corr < 0.001; see Table S7 and Table S8 for details). There was no significant difference in performance in the absence of feedback compared to presence of feedback (Fig. 3; effect of feedback in the sampling task: b = − 0.005 $\pm$ 0.048, P = 0.919; knapsack task: b = − 0.014 $\pm$ 0.055, P = 0.801; Table S7 and Table S8).

We confirmed participants were actively engaged in solving hard trials, rather than making random choices. In both the sampling and the knapsack task, participants’ objective performance in hard trials was significantly higher than the chance level (i.e., 0.5) (intercept: b = 0.715 $\pm$ 0.073, P_corr < 0.001 in the sampling task, and b = 0.646 $\pm$ 0.067, P_corr < 0.001 in the knapsack task).

Main experiment: self-performance estimates

In the sampling task, SPEs (bid values in the BDM auction at the end of each block) were significantly lower in hard trials compared to easy trials (effect of difficulty: b = − 0.044 $\pm$ 0.008, P_corr < 0.001; Fig. 4a, b; Table S9) and in the absence of feedback compared to the presence of feedback (effect of feedback: b = 0.060 $\pm$ 0.008, P_corr < 0.001; Fig. 4a, b; Table S9). There was no significant interaction between difficulty and feedback conditions (interaction effect: b = − 0.018 $\pm$ 0.008, P_corr = 0.085; Fig. 4a, b; Table S9). These findings replicate the results of previous studies^18,35.

In the knapsack task, SPEs were lower in hard compared to easy trials (effect of difficulty: b = − 0.077 $\pm$ 0.009, P_corr < 0.001; Fig. 4c, d; Table S10). SPEs were also lower in the absence of feedback compared to the presence of feedback but only in easy trials (Fig. 4c), indicated by the significant interaction between difficulty and feedback (b = − 0.030 $\pm$ 0.009, P_corr = 0.005; Fig. 4d; Table S10).

We then directly tested the differential effects of feedback on SPEs in hard trials between the sampling and knapsack tasks (i.e., SPEs were lower in the absence of feedback compared to the presence of feedback only in the sampling task). The additional analysis revealed that, in hard trials, the effects of feedback differ between the two tasks, indicated by the significant interaction of task type and feedback (b = − 0.027 $\pm$ 0.009, P_corr = 0.007; Table S11). We also confirmed that the interaction effect remained significant after controlling for potential effects of the participants’ objective performance (Tables S12–S14).

As a robustness check, we confirmed that our findings in relation to SPEs are robust with regards to the measurement of self-performance. The patterns of SPEs were similar when considering self-reported performance ratings instead of BDM auction bids (Fig. 5). In both tasks, self-performance ratings were lower in hard compared to easy trials (effects of difficulty: b = − 0.267 $\pm$ 0.044, P_corr < 0.001 in the sampling task, and b = − 0.400 $\pm$ 0.044, P_corr < 0.001 in the knapsack task; see Tables S15 and S16). In the sampling task, self-performance ratings were lower in the absence of feedback compared to the presence of feedback (effect of feedback is b = 0.288 $\pm$ 0.044, P_corr < 0.001; Table S15). In the knapsack task, the self-performance ratings were also lower in the absence of feedback compared to the presence of feedback but only in easy trials (Fig. 5c), as indicated by the significant interaction effect between difficulty and feedback (b = − 0.163 $\pm$ 0.044, P_corr < 0.001; Table S16). In hard trials, the effects of feedback on self-performance ratings differ between the two tasks, as indicated by the significant interaction effect between task type and feedback (b = − 0.139 $\pm$ 0.045, P_corr = 0.007; see Table S17). This interaction effect also remained significant after controlling for participants’ objective performance (Tables S18–S20).

Moreover, we examined potential effects of learning or experience. To this end, we included a factor variable representing the order of the blocks (i.e., 1 for the 1st block, 2 for the 2nd block and so on) into our GLMMs. The additional analysis revealed that the main results above were not affected by block order (Tables S21–S24). These results suggest that our findings are robust with regards to the choice of measurement of SPEs and block order.

Discussion

This study investigated metacognition in computationally complex decision-making. Specifically, we examined the effects of task difficulty and the presence of feedback about the correctness of a decision on the formation of global SPEs when people are making cognitively simple, perceptual and complex, economic decisions (i.e., the sampling and the knapsack tasks, respectively).

We replicate previous findings about the effects of task difficulty and the presence of feedback in perceptual decision-making^18,35. In the sampling task, objective performance was lower in harder trials but was not modulated by the presence of feedback (Fig. 3a, b). Furthermore, SPEs were found to be modulated by both task difficulty and the presence of feedback (Fig. 4a, b): they were lower in harder trials and in the absence of feedback (compared to trials with feedback: i.e., a reduction of SPEs in the absence of feedback). Most studies of metacognition in perceptual decision-making to date have examined local confidence in individual decisions^17,22,39. Only a small number of studies has investigated global SPEs^18,19,35,40. Our findings thus contribute to the growing literature on the formation of SPEs in perceptual decision-making^18,19,35.

In the knapsack task, a complex economic decision-making task, we found that objective performance was modulated by task difficulty, defined based on a theoretical measure of computational hardness (instance complexity) (Fig. 3c, d). This finding is consistent with findings about the effects of instance complexity on performance demonstrated by previous studies^34,36. The present study thus contributes to the growing literature highlighting the importance of formal measures of computational complexity as a predictor of human performance in complex decisions^34,36.

Similar effects of the theoretical measure of computational hardness were found on SPEs in the knapsack task. Specifically, we demonstrated that SPEs were lower in harder trials, that is, trials with lower instance complexity (instance complexity is inversely correlated with computational hardness^34,36) (Fig. 4c, d). This result indicates that participants were able to sense task difficulty while evaluating the quality of their complex decisions, even though the difficulty level was not explicitly indicated to them in the experiment. To the best of our knowledge, this is the first demonstration that people are capable of internally evaluating their own performance in computationally complex decisions, similar to perceptual decisions^18,19, knowledge-based judgment¹³, motor control^41,42, and memory retrieval⁴³.

Moreover, we show that in the knapsack task, feedback modulates SPEs differently in easy and hard trials (Fig. 4c, d). In easy trials, participants displayed lower SPEs in the absence of feedback compared to the presence of feedback. This pattern of SPEs in easy trials is consistent with the genuine reduction of SPEs in the absence of feedback observed in the sampling task¹⁸, despite the substantial differences between cognitive burden required for solving the sampling (i.e., counting dots and computing the difference) and knapsack (i.e., combinatorial optimisation) tasks. Strikingly, one previous study of perceptual decision-making¹⁸ suggests such a pattern of SPE may result from a metacognitive process in which people integrate information about feedback (that is, in the presence of feedback) and decision confidence (in the absence of feedback) over multiple trials in a Bayesian manner.

In hard trials of the knapsack task, on the other hand, we did not observe any modulation of SPEs by feedback (Fig. 4c). Both in the presence and absence of feedback, participants displayed low SPEs around the chance level even though objective performance was above chance level (Fig. 3c). This pattern cannot be explained by the same kind of Bayesian integration¹⁸ that can account for pattern of SPEs observed in easy trials (Fig. 4c) and in the sampling task (Fig. 4a). The pattern of SPEs observed in hard trials can potentially be explained by metacognitive control of cognitive resources. When participants are making hard decisions, a larger proportion of their (limited) cognitive resources is allocated to the decision-making process, leaving little or no resources to other metacognitive and other cognitive processes. In this case, not enough resources are allocated to paying attention to or memorizing feedback (in the presence of feedback) and decision confidence (in the absence of feedback), leading to the failure of incorporating such information into estimates of global self-performance. As a result, participants exhibited low SPEs, around chance level, in hard trials regardless of the presence or absence of feedback. Following this speculation, participants should exhibit more accurate SPEs (i.e., higher discriminability about the objective performance) in easy as compared to hard trials in complex decision-making. The discriminability of SPE (correlation between SPE and performance) was found to be slightly greater in easy trials than in hard trials (easy trials: r = 0.266 $\pm$ 0.106, mean $\pm$ SEM across participants; hard trials: r = 0.216 $\pm$ 0.111), which is at least to some degree consistent with our speculation, although the difference was not statistically significant (P = 0.752, two-tailed paired t-test). Those findings suggest that distinct cognitive mechanisms may be involved in the estimation of global self-performance in easy and hard problems in computationally complex decision-making. Future work is needed to examine this conjecture in more details.

In both the sampling and knapsack tasks, SPEs were approximately at the chance level (i.e., 0.5) in hard trials without feedback, while the objective performances exceeded the chance level (Fig. 4a, c). This phenomenon could be related to previous reports suggesting that metacognitive insights often lag behind above-chance objective performance when no feedback is provided^44,45.

Our study has several limitations. First, we only collected a limited number of SPEs (i.e., two data points per condition and task), due to the relatively long duration of a trial in the knapsack task (approximately 30 s). This makes it difficult to implement advanced data analyses (e.g., computational model fitting) to further elucidate the possible mechanisms underlying the formation of SPEs. Second, the current study used the knapsack decision problem as a representative of computationally intractable problems. However, it remains an open question whether the results we found here are generalizable to other computationally intractable problems. To explore this question, future studies should test the generalizability of the patters of SPEs found in the knapsack decision task to other NP-complete problems such as Boolean satisfiability problem (K-SAT)^46,47,48 and the traveling salesperson problem (TSP)⁴⁹. Lastly, the attribution of differences between SPEs of two types of decision-making in the current study needs further investigation. The difference between the two types of tasks may not simply be limited to computational difficulty, but also to perceptual uncertainty, as the stimuli in the perceptual task were only flashed on the screen briefly, while in the knapsack task, they were much longer. Also, it could be attributed to the difference in objective performance between the two tasks, as participants had slightly higher accuracy in the knapsack task compared to the sampling task (Fig. 3a, c). Another possibility could be the difference in participants’ familiarity with the two types of tasks. While we did not recruit individuals with prior experience in the knapsack or sampling task for our past experiments, we cannot strictly rule out this possibility.

In conclusion, this study provides insights into metacognition in complex decision-making by examining the effects of task difficulty and feedback on global SPEs in a computationally hard task. In contrast to perceptual decisions, our findings suggest that people estimate their global performance in complex decisions through distinct metacognitive mechanisms for easy and hard instances. We believe that the current study is the first step towards a comprehensive understanding of how humans learn about their own performance when making complex decisions.

Methods

The study protocol was approved by the University of Melbourne Human Research Ethics Committee (Ethics ID 1954681.1), and all analyses were conducted in accordance with relevant guidelines and regulations.

Participants

We recruited 17 and 36 participants from the University of Melbourne for the pilot and main experiment, respectively (five males, age range = 19–38 and the mean $\pm$ SD = 23.71 $\pm$ 3.26 years in the pilot experiment; and 15 males, age range = 18–32 and the mean $\pm$ SD = 22.31 $\pm$ 5.19 years in the main experiment). The sample size of the main experiment was determined based on a power analysis based on pilot data (see the section of Data analysis below). All participants gave their informed consent.

Experimental task

Overview

Participants performed two decision-making tasks in the experiment (in-person for the pilot experiment and online for the main experiment): the sampling task (Fig. 1a) and the knapsack task (Fig. 1b). Details of both tasks are provided below.

The experiment consisted of 16 blocks (eight blocks for each of the two tasks). The blocks differed with regards to two factors: difficulty (easy or hard) and the presence of feedback about the correctness of a decision (presence or absence of feedback). The factorial design resulted in four types of blocks (Fig. 1e): easy-feedback, easy-no feedback, hard-feedback, hard-no feedback. Each combination of the four block types and the two decision-making tasks was presented twice in the 16 blocks of the experiment in a random order.

In each block, participants solved 10 trials of the same level of difficulty (Fig. 1f). After that, they were instructed to indicate their willingness to pay to earn the chance of attempting an additional five trials of the same level of difficulty as the preceding 10 trials in the same block using a Becker-DeGroot-Marschke (BDM) auction⁵⁰ (Fig. 1c). The auction procedure enabled us to measure a participant’s self-performance estimates in an incentive-compatible manner³⁸. Note that no feedback was provided in the additional bonus problems.

Participants were instructed at the beginning of the experiment that it would take approximately 1.5 h and that they could not leave if they finished it earlier.

Sampling task

In each trial, participants saw two boxes on the screen that contained a number of dots. Participants were asked to decide which of two boxes contained the larger number of dots^18,19,22,35 (Fig. 1a). The difficulty of the decision was modulated by the adjusting the difference in number of dots in the two boxes (the differences were 46 and 18 dots in easy and hard trials, respectively). Each trial started with a fixation cross displayed at the centre of the screen (1.2 s). On the subsequent screen, two black boxes containing a different number of white dots were briefly flashed (300 ms). Participants had 3 s to indicate which of the two boxes they thought contained the larger number of dots.

Knapsack task

Participants solved 80 trials of the 0–1 knapsack decision problem^34,36 (Fig. 1b). Each trial started with a screen displaying a fixation cross at the centre (1.2 s). On the next screen, six items were presented at random locations. Each item has two attributes: a value and a weight. Item values (in dollars) were indicated with dollar bills, and item weights (in kilograms) were indicated with a black weight symbol. The size of both the dollar bills and the weight symbols were proportional to the value and weight, respectively, they indicated. The weight limit (capacity) of the knapsack in kilograms and the target profit in dollars were displayed inside a green circle at the centre of the screen, and the green circle indicated the time remaining in the trial (maximum of 25 s). Participants were asked to indicate whether there exists a subset of items whose total weight does not exceed the capacity, while the total of value is at least as high as the target profit. Once participants had solved the instance, the indicated their response (“YES” or “NO”) by pressing a corresponding button on the keyboard (maximum of 3 s).

Following previous studies³⁴, we defined the difficulty of each trial in terms of ‘instance complexity’ (IC): the distance between the target profit and the maximum profit attainable in the corresponding instance of the optimization variant of the 0-1 knapsack problem. Note that IC is inversely correlated with computational hardness, which means a lower value of IC corresponds to a higher level of computational hardness. Specifically,

$$IC= \left|\frac{p-{p}^{*}}{\sum {v}_{i}}\right|$$

where $p$ is the target profit of the problem, ${p}^{*}$ is the maximum profit achievable in the corresponding optimization instance (i.e. the highest total value of a knapsack achievable with the given set of items and the given weight constraints), and ${v}_{i}$ denotes the values of i-th item. Note that IC relies on the weight information, although it does not appear in the above formula (see Text S1). The values of IC were 0.16 and 0.01 in easy and hard trials, respectively.

Self-performance estimates (SPE)

At the end of each block, participants were asked to indicate their SPEs in two ways: a bid in a BDM auction, and a self-reported rating on a Likert scale. After finishing 10 trials in a block, a participant was endowed with A$1 and made a bid for the chance of attempting an additional 5 trials. These additional five trials were of the same difficulty level as the first 10 trials within the preceding block. A participant was rewarded with A$0.2 for each correct decision. In each BDM auction, a counter-bid was drawn randomly from the range A$0-A$1. If the participant’s bidding value was equal or greater than the counter-bid, the participant paid the counter-bid and could attempt an additional five trials. Otherwise, the participants kept the initial endowment of A$1 and left the block. This auction mechanism is incentive-compatible in the sense that the optimal strategy for a participant is to always bid the number closest to their true belief about the proportion of additional trials they thought they could solve correctly (i.e., SPE; See Text S2). Participants were explicitly instructed in the optimal strategy, and using a questionnaire, we confirmed that they correctly understood it. Participants were also asked to report their self-performance directly. Specifically, a participant was asked to indicate on a 6-point Likert scale (0 = none of them, 5 = all of them) of how many additional trials they thought they could solve correctly. Note that self-reported performance ratings were found to be highly correlated with the bid values (sampling task: $r=$ 0.752 $\pm$ 0.050, mean $\pm$ SEM across participants; knapsack task: $r=$ 0.760 $\pm$ 0.043), suggesting that these two represent consistent but not the exact same constructs. The order of the BDM auction and the Likert rating was randomized across blocks and participants.

Reward payment

Participants received a show-up fee of A$10 and were a payment of A$0.2 for each correct answer for both the first 10 and the 5 bonus trials in each block.

Data analysis

Pilot experiment

We examined the effects of difficulty and feedback on trial-by-trial objective performance and block-by-block SPEs (i.e., BDM bid value) using generalized linear mixed effect models (GLMMs). We modelled objective performance as follows: logit P(correct) ~ 1 + D + F + D x F + (1 | participants) (see Tables S1 and S4). Here, P(correct) was coded as 1 if the participant chose the correct option and 0 otherwise. D denotes the difficulty (coded as −1 for easy and 1 for hard problems), F denotes the feedback (coded as −1 for problems without feedback and 1 for problems with feedback), and D × F indicates the interaction. We added participant random effects on the intercept. We modelled global SPEs as follows (see Tables S2-S3 and Tables S5-S6): BDM value ~ 1 + D + F + D x F + (1 | participants) with participant random effects on the intercept.

Determination of the sample size

We determined the sample size of the main experiment based on the estimations of the GLMMs described above by using glmer and powerCurve function in R, aiming for a statistical power of at least 0.8 given a significance level a = (0.05/3) for the effects of interest: i.e., effects of difficulty and feedback and the interaction on SPEs in the knapsack task. The required sample size is N = 36.

Main experiment

We examined the effects of difficulty and feedback on trial-by-trial objective performance and block-by-block global SPEs using the same models we used for the analysis of the pilot data (see above).

We also compared participants’ global SPEs in hard blocks across two types of tasks by performing additional GLMM analyses. First, we modelled global SPEs as follows: BDM value ~ 1 + T + F + T x F + (1 | participants). Here, T denotes the task type (coded as −1 for the sampling task and 1 for the knapsack task), F denotes the feedback (coded as −1 for problems without feedback and 1 for problems with feedback), and T × F indicates the interaction. We added participant random effects on the intercept. Second, to exclude the possibility that the differential effects of feedback on SPEs in hard trials between two tasks can be attributed to the difference in participants’ objective performance across tasks, we modelled SPEs as BDM value ~ 1 + T + F + T x F + P + (1 | participants), where P indicates participants’ objective performance (i.e., the proportion of correct choices) in each of the hard blocks. Furthermore, we tested whether the differential effects of feedback on SPEs in hard trials between two tasks can be attributed to the difference in participants’ objective performance in the first and the second half of each hard block by modelling SPEs as follows: BDM value ~ 1 + T + F + T x F + P₁ + (1 | participants), where P₁ indicates participants’ objective performance in the first half of each of the hard blocks and BDM value/self-performance rating ~ 1 + T + F + T x F + P₂ + (1 | participants), where P₂ indicates participants’ objective performance in the second half of each of the hard blocks.

We also compared participants’ objective performance in hard trials to chance level (i.e., 0.5) by performing a GLMM analysis. We modelled participants’ performance in hard trials as follows: logit P(correct) ~ 1 + F + (1 | participants), where P(correct) was coded as 1 if participant chose the correct option and 0 otherwise. If participants’ performance accuracy is greater than chance, then the intercept term should be greater than 0.

To examine robustness of our results related to our main study, we added block order as an explanatory variable of no interest to the models of both trial-by-trial objective performance and block-by-block SPEs (see Tables S21-S24).

The power analysis was conducted in R (version 4.2.1) using the glmer (version 1.1.30) and powerCurve (version 1.0.6) functions. The GLMMs were estimated in MATLAB (version R2022_a) using the fitglme function.

Data availability

The datasets generated and/or analysed during the current study are available on Open Science Framework (https://osf.io/pks4b/).

References

Borkowski, J. G., Carr, M. & Pressley, M. “Spontaneous” strategy use: Perspectives from metacognitive theory. Intelligence 11, 61–75 (1987).
Article Google Scholar
Flavell, J. H. Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. Am. Psychol. 34, 906 (1979).
Article MATH Google Scholar
Livingston, J. Metacognition: An Overview (University at Buffalo State University, 2003).
MATH Google Scholar
Norman, E. et al. Metacognition in psychology. Rev. Gen. Psychol. 23, 403–424 (2019).
Article MATH Google Scholar
Suzuki, S. Inferences regarding oneself and others in the human brain. PLoS Biol. 20, e3001662 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Guggenmos, M., Wilbertz, G., Hebart, M. N. & Sterzer, P. Mesolimbic confidence signals guide perceptual learning in the absence of external feedback. Elife 5, e13388 (2016).
Article PubMed PubMed Central Google Scholar
Kepecs, A., Uchida, N., Zariwala, H. A. & Mainen, Z. F. Neural correlates, computation and behavioural impact of decision confidence. Nature 455, 227–231 (2008).
Article ADS CAS PubMed MATH Google Scholar
Yeung, N. & Summerfield, C. Metacognition in human decision-making: confidence and error monitoring. Phil. Trans. R. Soc. B 367, 1310–1321 (2012).
Article PubMed PubMed Central MATH Google Scholar
Maniscalco, B. & Lau, H. A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings. Conscious. Cogn. 21, 422–430 (2012).
Article PubMed MATH Google Scholar
Fleming, S. M. & Daw, N. D. Self-evaluation of decision-making: A general Bayesian framework for metacognitive computation. Psychol. Rev. 124, 91 (2017).
Article PubMed PubMed Central MATH Google Scholar
Kepecs, A. & Mainen, Z. F. A computational framework for the study of confidence in humans and animals. Philos. Trans. R. Soc. B Biol. Sci. 367, 1322–1337 (2012).
Article MATH Google Scholar
Pouget, A., Drugowitsch, J. & Kepecs, A. Confidence and certainty: distinct probabilistic quantities for different goals. Nat. Neurosci. 19, 366–374 (2016).
Article CAS PubMed PubMed Central MATH Google Scholar
Sanders, J. I., Hangya, B. & Kepecs, A. Signatures of a statistical computation in the human sense of confidence. Neuron 90, 499–506 (2016).
Article CAS PubMed PubMed Central MATH Google Scholar
Fleming, S. M., Huijgen, J. & Dolan, R. J. Prefrontal contributions to metacognition in perceptual decision making. J. Neurosci. 32, 6117–6125 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hebart, M. N., Schriever, Y., Donner, T. H. & Haynes, J.-D. The relationship between perceptual decision variables and confidence in the human brain. Cereb. Cortex 26, 118–130 (2016).
Article PubMed Google Scholar
Xiang, Y., Graeber, T., Enke, B. & Gershman, S. J. Confidence and central tendency in perceptual judgment. Atten. Percept. Psychophys. 83, 3024–3034 (2021).
Article PubMed MATH Google Scholar
Zylberberg, A., Barttfeld, P. & Sigman, M. The construction of confidence in a perceptual decision. Front. Integr. Neurosci. 6, 79 (2012).
Article PubMed PubMed Central MATH Google Scholar
Rouault, M., Dayan, P. & Fleming, S. M. Forming global estimates of self-performance from local confidence. Nat. Commun. 10, 1141 (2019).
Article ADS PubMed PubMed Central MATH Google Scholar
Rouault, M. & Fleming, S. M. Formation of global self-beliefs in the human brain. Proc. Natl. Acad. Sci. U.S.A. 117, 27268–27276 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Rouault, M., McWilliams, A., Allen, M. G. & Fleming, S. M. Human metacognition across domains: insights from individual differences and neuroimaging. Personal. Neurosci. 1, e17 (2018).
Article PubMed PubMed Central Google Scholar
Hoven, M. et al. Abnormalities of confidence in psychiatry: an overview and future perspectives. Transl. Psychiatry 9, 268 (2019).
Article PubMed PubMed Central MATH Google Scholar
Rouault, M., Seow, T., Gillan, C. M. & Fleming, S. M. Psychiatric symptom dimensions are associated with dissociable shifts in metacognition but not task performance. Biol. Psychiatry 84, 443–451 (2018).
Article PubMed PubMed Central Google Scholar
Bandura, A. Self-efficacy: Toward a unifying theory of behavioral change. Psychol. Rev. 84, 191–215 (1977).
Article CAS PubMed MATH Google Scholar
Elliott, R. et al. Neuropsychological impairments in unipolar depression: the influence of perceived failure on subsequent performance. Psychol. Med. 26, 975–989 (1996).
Article CAS PubMed MATH Google Scholar
Zacharopoulos, G., Binetti, N., Walsh, V. & Kanai, R. The effect of self-efficacy on visual discrimination sensitivity. PloS One 9, e109392 (2014).
Article ADS PubMed PubMed Central Google Scholar
Meloso, D., Copic, J. & Bossaerts, P. Promoting intellectual discovery: Patents versus markets. Science 323, 1335–1339 (2009).
Article ADS CAS PubMed Google Scholar
Kellerer, H., Pferschy, U. & Pisinger, D. Knapsack Problems (Springer, 2004).
Book MATH Google Scholar
Lisiani, L. & Mappanyukki, R. The effect of DER, DPR, ROE, AND PBV ON stock return (Study on non-financial companies listed on the IDX in the period 2017–2019). JPPI J. Penelit. Pendidik. Indones. 7, 712–721 (2021).
Google Scholar
Von Neumann J, Morgenstern O. Theory of games and economic behavior, 2nd rev. (1947).
Simon, H. A. Rational choice and the structure of the environment. Psychol. Rev. 63, 129 (1956).
Article CAS PubMed MATH Google Scholar
van Opheusden, B. & Ma, W. J. Tasks for aligning human and machine planning. Curr. Opin. Behav. Sci. 29, 127–133 (2019).
Article MATH Google Scholar
Bossaerts, P. & Murawski, C. Computational complexity and human decision-making. Trends Cogn. Sci. 21, 917–929 (2017).
Article PubMed MATH Google Scholar
Murawski, C. & Bossaerts, P. How humans solve complex problems: The case of the knapsack problem. Sci. Rep. 6, 34851 (2016).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Franco, J. P., Yadav, N., Bossaerts, P. & Murawski, C. Generic properties of a computational task predict human effort and performance. J. Math. Psychol. 104, 102592 (2021).
Article MathSciNet MATH Google Scholar
Rouault, M., Will, G.-J., Fleming, S. M. & Dolan, R. J. Low self-esteem and the formation of global self-performance estimates in emerging adulthood. Transl. Psychiatry 12, 272 (2022).
Article PubMed PubMed Central MATH Google Scholar
Franco, J. P., Doroc, K., Yadav, N., Bossaerts, P. & Murawski, C. Task-independent metrics of computational hardness predict human cognitive performance. Sci. Rep. 12, 12914 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Sahni, S. Approximate algorithms for the 0/1 knapsack problem. J. ACM (JACM) 22, 115–124 (1975).
Article MathSciNet MATH Google Scholar
Lebreton, M., Bacily, K., Palminteri, S. & Engelmann, J. B. Contextual influence on confidence judgments in human reinforcement learning. PLoS Comput. Biol. 15, e1006973 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Aitchison, L., Bang, D., Bahrami, B. & Latham, P. E. Doubly Bayesian analysis of confidence in perceptual decision-making. PLoS Comput. Biol. 11, e1004519 (2015).
Article ADS PubMed PubMed Central Google Scholar
Seow, T. X., Rouault, M., Gillan, C. M. & Fleming, S. M. How local and global metacognition shape mental health. Biol. Psychiatry 90, 436–446 (2021).
Article PubMed Google Scholar
Trommershäuser, J., Maloney, L. T. & Landy, M. S. Decision making, movement planning and statistical decision theory. Trends Cogn. Sci. 12, 291–297 (2008).
Article PubMed MATH Google Scholar
Dotan, D., Meyniel, F. & Dehaene, S. On-line confidence monitoring during decision making. Cognition 171, 112–121 (2018).
Article PubMed MATH Google Scholar
Weidemann, C. T. & Kahana, M. J. Assessing recognition memory using confidence ratings and response times. R. Soc. Open Sci. 3, 150670 (2016).
Article ADS PubMed PubMed Central MATH Google Scholar
Pasquali, A., Timmermans, B. & Cleeremans, A. Know thyself: Metacognitive networks and measures ofconsciousness. Cognition, 117, 182–190 (2010).
Cleeremans, A. Prediction as a computational correlate of consciousness. Int. J. Anticip. Comput. Syst. 29, 3–13 (2014).
Google Scholar
Cheeseman, PC., Kanefsky, B. & Taylor WM. Where the really hard problems are. In 12th International JointConference on Artificial Intelligence (IJCAI), pages 331–337 (1991).
Krzakala, F., Montanari, A., Ricci-Tersenghi, F., Semerjian, G. & Zdeborová, L. Gibbs states and the set of solutions of random constraint satisfaction problems. Proc. Natl. Acad. Sci. U.S.A. 104, 10318–10323 (2007).
Article ADS MathSciNet PubMed MATH Google Scholar
Selman, B. & Kirkpatrick, S. Critical behavior in the computational cost of satisfiability testing. Artif. Intell. 81, 273–295 (1996).
Article MathSciNet MATH Google Scholar
Gent, I. P. & Walsh, T. The TSP phase transition. Artif. Intell. 88, 349–358 (1996).
Article MathSciNet MATH Google Scholar
Becker, G. M., Degroot, M. H. & Marschak, J. Measuring utility by a single-response sequential method. Behav. Sci. 9, 226–232 (1964).
Article CAS PubMed MATH Google Scholar

Download references

Acknowledgements

The authors thank Dr. Elizabeth Bowman for her support of the laboratory and online experiments, Dr. Pablo Franco for his assistance in preparing the instances of the knapsack problem used in the experiment, and Dr. Nitin Yadav and Mr. Anthony Hsu for their assistance in programming the experimental task. This research was supported by a University of Melbourne Research Scholarship from the Faculty of Business and Economics (X.L.). P.B. acknowledges financial support through a R@MAP Chair from the University of Melbourne.

Author information

Authors and Affiliations

Centre for Brain, Mind and Markets, Faculty of Business and Economics, The University of Melbourne, Melbourne, Australia
Xiaping Lu, Carsten Murawski & Shinsuke Suzuki
Faculty of Economics, University of Cambridge, Cambridge, UK
Peter Bossaerts
Faculty of Social Data Science, Hitotsubashi University, Kunitachi, Japan
Shinsuke Suzuki
HIAS Brain Research Center, Hitotsubashi University, Kunitachi, Japan
Shinsuke Suzuki

Authors

Xiaping Lu
View author publications
Search author on:PubMed Google Scholar
Carsten Murawski
View author publications
Search author on:PubMed Google Scholar
Peter Bossaerts
View author publications
Search author on:PubMed Google Scholar
Shinsuke Suzuki
View author publications
Search author on:PubMed Google Scholar

Contributions

C.M., P.B., S.S., and X.L. designed the study; X.L. programmed the experimental tasks; X.L. performed data collection; S.S. and X.L. performed data analysis; S.S., C.M., P.B., and X.L. wrote the manuscript.

Corresponding authors

Correspondence to Xiaping Lu or Shinsuke Suzuki.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Lu, X., Murawski, C., Bossaerts, P. et al. Estimating self-performance when making complex decisions. Sci Rep 15, 3203 (2025). https://doi.org/10.1038/s41598-025-87601-8

Download citation

Received: 17 April 2023
Accepted: 20 January 2025
Published: 25 January 2025
DOI: https://doi.org/10.1038/s41598-025-87601-8

Subjects

Abstract

Similar content being viewed by others

Neural mechanisms of metacognitive improvement under speed pressure

Joint contributions of metacognition and self-beliefs to uncertainty-guided checking behavior

The integration of self-efficacy and response-efficacy in decision making

Introduction

Results

Pilot experiment

Main experiment: objective performance

Main experiment: self-performance estimates

Discussion

Methods

Participants

Experimental task

Overview

Sampling task

Knapsack task

Self-performance estimates (SPE)

Reward payment

Data analysis

Pilot experiment

Determination of the sample size

Main experiment

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links