Fig. 3: Computation of choice and confidence in the POMDP model of the direction discrimination task.

a The expected confidence gain for a new observation as a function of inferred mean coherence, μt, and elapsed time, t. b An example POMDP decision policy when new observations are associated with a constant cost. The yellow area represents the belief states where the optimal action is to continue observing. The purple area represents the belief states where the POMDP model terminates and commits to a choice. c Confidence as a function of inferred mean coherence, μt, and time, t. d The ratio of reward utilities for sure-bet and correct direction choices determines the POMDP policy for choosing the sure-bet option. The policy for sure-bet can be illustrated as phase boundaries in the confidence plot of c. The blue region denotes combinations of inferred coherence and time for which the model would choose the sure-bet target. The red region denotes (μt, t) for which direction targets are chosen. Thresholds for separating low and high confidence ratings are thus the boundaries between blue (low confidence) and red (high confidence) regions. Solid white lines show the two decision termination bounds where the model stops gathering more observations and commits to a decision. In these simulations σz = 2.0, σ0 = 1.0, and the utility ratio = 0.63.