Fig. 1: Design of the hidden-brain-state reinforcement learning task.
From: Unconscious reinforcement learning of hidden brain states supported by confidence

Subjects (N = 18) were assigned to one of two groups, which differed in the brain region targeted by their decoder: visual cortex (VC, N = 9) or prefrontal cortex (PFC, N = 9). For all analyses, the brain region was treated as a between-subjects factor; unless this factor displayed a significant effect, subjects were pooled into one cohort. a The learning task consisted of three consecutive sessions. In each session, decoding was performed with fMRI multivoxel patterns; the decoder output was used in real time to determine the RL state on a trial-by-trial basis. In a given RL state, only one action was optimal, with a high probability (0.8) of reward, while the other action had low reward probability (0.2). In the last (control), third session the output likelihood was also used to proportionally define the motion direction of the visual stimulus. Even in the last session, early trials had very low coherence, and only the latter half of the session had trials with coherence high enough to be easily detected and for subjects to consciously learn the rule. b Each trial started with a blank intertrial interval (ITI, 6 s). Random dot motion was then shown for 8 s (Stimulus ON). On the first two sessions, the motion was entirely random and the dots were dim (20% of maximum), while on the third session the last 2 s had increasingly higher coherence (partially determined by the decoder’s likelihood). Subjects then had to report the direction of motion (the latent state), their confidence in their choice, followed by a gamble on one of two actions (A or B). After action selection, the outcome for the current trial (reward: 30¥/0.25$, or no reward: 0¥/$) was shown on the screen. Accounting for the haemodynamic delay meant that decoding was performed on data corresponding to the ITI. This ensured that mental imagery or illusory perception could not index the latent state determined by the decoder from neural activity. HR haemodynamic response delay, L left, R right.