Fig. 4: Reinforcement learning models fit to choice behavior in the food task.

A Model comparison, lower WAIC scores indicate better model fit (the blue bar indicates the winning model). The displayed fits include the following models (see Table 1 for parameter values), ‘M0’: base model, standard Q-learning model; ‘M1’: base model adding a reward-sensitivity parameter; ‘M2’: base model adding different learning rates for positive and negative prediction errors; ‘M3’: base model adding a choice persistence parameter for a particular category; ‘M4’: base model with fit initial values for each food category; ‘M2 + M4’: model that includes the asymmetric learning rate (M2) and initial value (M4) components; ‘group-difference + M2 + M4’: best-fitting model that includes group-difference parameters (high eating-disorder ‘HED’ versus low eating-disorder ‘LED’ group) in the positive learning rates (M2) and in the initial values (M4) of the food categories (fit separately for each learning block). B–G Group-difference (HED versus LED) parameter distributions for the best-fitting model; y-axis indicates density. Parameters were considered significant if their 95% confidence interval did not span 0 (dotted line). In the first learning block of the food task (pre-arousal, top row), (B) HED had higher initial values for low- (versus high-) calorie food stimuli and (C, D) higher positive learning rates across food categories (low and high calorie). However, in the second learning block of the food task (after the emotional arousal manipulation, bottom row), group differences in (E) initial values for the food categories and (F) positive learning rate for low-calorie stimuli no longer passed significance. Nevertheless, (G) HED’s positive learning rates for high-calorie food shifted to be lower than that estimated for LED after the emotional arousal manipulation. This change was specific to high-calorie foods (not low-calorie foods) and accounted for the behavioral pattern of food biases in the second block of the food task.