Fig. 1: Reinforcement-learning paradigm and behavior. | Nature Communications

Fig. 1: Reinforcement-learning paradigm and behavior.

From: Human thalamic low-frequency oscillations correlate with expected value and outcomes during reinforcement learning

Fig. 1

a Schematic figure (derived from Allen brain atlas) of the position of the deep brain stimulation electrodes used to record intra-thalamic signals (ATN anterior thalamic nucleus, DMTN dorsomedial thalamic nucleus, TH Thalamus, HTH Hypothalamus, GPi/GPe Globus pallidus intern/extern, LV Left ventricle). b Successive screenshots of a typical trial in the reward (top) and punishment (bottom) conditions. Patients had to select one abstract visual cue among the two presented on each side of a central visual fixation cross and subsequently observed the outcome. Durations are given in seconds. c Average±SEM reaction times across patients (n = 8 patients) through trials shown separately for the reward (Rew, green) and punishment (Pun, red) conditions. d Average ± SEM reaction times across patients (n = 8 patients) in the reward and punishment conditions. Dots represent data from individual patients. Asterisk indicates the significance of a paired two-sided t-test comparing reaction times between conditions (t(7) = −3.10, p = 0.017). e Average±SEM choice performance across patients (n = 8 patients). The average predicted performance from a fitted Q-learning model is indicated by a white dot for each condition. Gray dots represent data from individual patients. Asterisk indicates the significance of the one-sample two-sided t-test used to compare for each condition the correct choice rate to the chance level (i.e., 50%; reward: t(7) = 4.23, p = 0.0039; punishment: t(7) = 5.13, p = 0.0014). f Average±SEM learning curves across patients (n = 8 patients) through trials shown separately for the reward and punishment conditions. g Average±SEM choice performance across patients (n = 8 patients) restricted to the last 4 trials of each condition. Asterisk indicates the significance of the one-sample two-sided t-test used to compare for each condition the correct choice rate to the chance level (i.e., 50%; reward: t(7) = 3.78, p = 0.0069; punishment: t(7) = 4.96, p = 0.0016). h Average ± SEM Akaike Information Criterion (AIC) of Q-learning (QL) model versus random choices across patients (n = 8 patients). Dots represent data from individual patients. Asterisk indicates the significance of the two-sided paired t-test used to compare the AIC of the QL model versus random choices (t(7) = −3.2, p = 0.015). Source data are provided as a Source Data file.

Back to article page