Figure 3
From: Exploring Feature Dimensions to Learn a New Policy in an Uninformed Reinforcement Learning Task

Representative policy estimation and corresponding entropy. (a) Policy estimation. The policy with the highest probability estimate in each trial was regarded as a currently used policy. (b) Current policy within each trial (orange squares, grey dot: time-points that policy transition occurred) and policy entropy values (black line). Difference in entropy values between policy transition time-points and all the other trials (inset, paired t-test, p < 0.01, mean ± SEM). (c) Transition time-point with regard to entropy and trial order. (Blue dot: each transition time-points for all participants, red line: linear regression result.) Transition time-points were significantly related to earlier trials and higher entropy (R2 = 0.264, p = 3.01 × 10−7).