Introduction

Methamphetamine use disorder (MUD) is characterized by biological, cognitive, and behavioral changes that can be detrimental at both the individual and societal level. Though outcomes vary widely, common psychological consequences include psychosis, suicidality, hostility, anxiety, depression, and psychomotor dysfunction1,2. Despite its growing prevalence worldwide3, cognitive mechanisms governing the onset, maintenance, and recurrence of MUD remain unclear.

One means by which MUD may be maintained is through the influence of expected negative outcomes of abstinence and associated withdrawal states, which can motivate avoidance when combined with negative reinforcement processes2. In particular, methamphetamine use may attenuate symptoms of depression or somatic anxiety that are brought on or exacerbated by withdrawal. Deficits in interoceptive processing may further contribute to maladaptive behavior, as previous work has shown that individuals with MUD exhibit attenuated neural responses (e.g., insula, anterior cingulate cortex) to aversive somatic states4. Countering withdrawal avoidance instead requires that individuals “test out” abstinence as a means of learning whether they are capable of enduring its short-term consequences to improve longer-term quality of life. In computational neuroscience, the abstract structure of this decision problem is captured by so-called “explore-exploit” decision tasks5,6. In these tasks, one can either exploit current (limited) knowledge to maximize short-term reward, or one can first test the outcomes of different options (explore) to make better informed choices in the long-term. Importantly, there are different exploratory strategies, which depend on distinct computational processes, and some may be more clinically relevant than others7,8,9. Directed exploration (DE), for example, requires keeping track of one’s relative uncertainty about different action outcomes, and then choosing the action for which one has the greatest uncertainty, as this leads to the most information gain. In contrast, so-called random exploration (RE) requires keeping track of one’s total uncertainty across action options, where greater total uncertainty should increase the chance of selecting options that do not currently appear most rewarding, as one might learn that past experiences were misleading. In the example of withdrawal avoidance mentioned above, DE may be more relevant, as the individual must recognize that they have greater uncertainty about the outcomes of abstinence than about those of continued use. A diminished drive to resolve relative uncertainty could therefore prevent attempts at abstinence and perpetuate the disorder. If it were demonstrated that reduced exploration and sensitivity to uncertainty were present, they could represent possible treatment targets in clinical studies.

Several studies support this in suggesting that individuals with MUD display lower levels of exploration and altered belief-updating. One study showed that participants with amphetamine use disorder engaged in less information-seeking than healthy controls in a decision-making task that had no cost associated with exploration10. This difference could be partially attributable to drug effects, given that chronic amphetamine use has been shown to deplete intracellular dopamine11 and that lower tonic dopamine levels have themselves been linked to reduced exploratory behavior in individuals with substance use disorders12. Furthermore, one longitudinal study of MUD found that participants who decreased methamphetamine use over a period of six weeks showed higher levels of DE by the end of that period13, supporting the idea that methamphetamine use may affect exploratory behavior. Notably, in addition to lower levels of exploration, multiple studies have also found evidence of maladaptive belief updating in individuals with MUD, which is often operationalized in terms of altered learning rates within computational models14,15,16,17,18,19. This overall pattern of altered sensitivity to, and learning from, choice outcomes may help explain continued use and high relapse rates despite reduced quality of life20.

In this study, we therefore had two main aims. First, we aimed to test whether, relative to healthy comparisons (HCs), individuals with MUD would show reduced DE and altered learning rates, as suggested by the literature reviewed above. Second, we sought to test how an aversive interoceptive state (i.e., a somatic anxiety induction) may affect these mechanisms. This allowed us to examine whether physical symptoms typically associated with withdrawal might exacerbate maladaptive decision-making patterns in those with MUD by curtailing further exploration. The latter aim was further motivated by previous work demonstrating increased risk of relapse in substance use disorders under heightened negative affective states21, suggesting that sensitivity to current anxiety levels could be an important factor in promoting maladaptive choice within this population. There is also a related body of work indicating relationships between anxiety, exploration, and learning rates more generally (for a review, see Chou et al.22). As the avoidance dynamics driven by these mechanisms are expected to be similar in other substance use disorders (and reflect known maintenance factors in psychopathology more broadly; e.g., anxiety, depression)23, these results could also inform future studies with a more transdiagnostic focus.

To accomplish these aims, we fit a computational model with both exploration and learning rate parameters to choice behavior on an established explore-exploit decision task in individuals with MUD and HCs—both with and without a breathing-based aversive interoceptive state induction. We then tested for both group differences and effects of induced somatic anxiety. We hypothesized that individuals with MUD would show lower levels of DE than HCs. We also hypothesized that the state anxiety induction would decrease DE in both groups.

As a supplementary aim, we also sought to replicate prior exploratory results demonstrating a relationship between DE and cognitive reflectiveness, and extend this to individuals with MUD9. Cognitive reflectiveness is the tendency to think through a problem before making a decision, instead of simply trusting initial impulses or the first answer that comes to mind24. We considered the possibility that lower levels of reflectiveness characteristic of substance use disorder populations (and less reflection on uncertainty in particular) might help to explain reduced DE. This could be of potential clinical relevance, as cognitive reflectiveness has been shown to improve with training25,26. Thus, if this hypothesis were confirmed, it would suggest that improving cognitive reflectiveness might promote more adaptive information-seeking in this population. We therefore tested the supplementary hypothesis that individuals with MUD would differ from HCs in reflectiveness and that this would account for differences in DE within a mediation model.

Methods

Participants

Participants included 56 inpatient treatment-seeking individuals with MUD and 58 HCs. Individuals with MUD were currently abstinent (mean time since methamphetamine use = 47.58 days, mean time since starting treatment = 34.07 days) and recruited from two recovery homes in the Tulsa, Oklahoma area: (1) GRAND and (2) Women in Recovery (WiR). All individuals in the MUD group met criteria for a DSM-5 diagnosis of Current Amphetamine Use Disorder due to use of methamphetamine, which was assessed by clinical interview (Mini International Neuropsychiatric Interview 727). Due to high rates of comorbidity, individuals with MUD were not excluded based on the presence of other substance use disorders or depression/anxiety disorders (for a list of comorbid disorders in the MUD group, see Supplementary Table S1). However, individuals with bipolar disorder, eating disorders, schizophrenia, or obsessive-compulsive disorder were excluded. Current use of psychotropic medications was permitted in the MUD group, as these are frequently utilized by providers in acute substance use treatment. HCs did not have any history of psychiatric illness and were not on any psychotropic medication. We note that data from this HC group have been used for comparison to a different clinical group in a separate report28.

Protocol

After providing informed consent to participate in a larger study protocol approved by WCG IRB (#20211403), participants completed a drug test and breathalyzer assessment to confirm eligibility for the study. Next, individuals with MUD completed the Desire for Speed (Methamphetamine) Questionnaire (DSQ)29 to assess baseline craving levels.

Following completion of this questionnaire, participants were fit with a silicon mask (see Fig. 1) that would later be used for anxiety induction during performance of the Horizon Task (described below)7. This breathing-based anxiety induction apparatus has been used safely and effectively in several previous studies4,30,31. Here, filters are used to add inspiratory resistance (i.e., requiring more effort to breathe in, but no added effort to breathe out), which induces a sensation of air hunger and elevates somatic anxiety. This initial fitting period was part of a sensitivity protocol designed to confirm sufficient comfort with the mask and allow us to assess how anxiety changed as a function of resistance level. During this preliminary sensitivity protocol, participants breathed through the mask while being exposed to six levels of resistance (0, 10, 20, 40, 60, and 80 cmH2O/L/sec) in ascending order for one minute each, with a short break in between each resistance. After each exposure, they were instructed: “Please rate how much anxiety you felt while breathing from 0 to 10” (where 0 indicates no anxiety and 10 indicates maximum possible anxiety). These are referred to below as self-reported anxiety scores. After completing this protocol, participants removed the mask, and the MUD group completed the DSQ a second time to assess whether craving levels had changed due to anxiety induction.

Fig. 1: Anxiety induction and behavioral task.
figure 1

A The silicon mask used during both runs of the Horizon Task. B An example resistor attached to the mask via a plastic tube (not depicted), causing participants to experience resistance during inhalation. A resistance of 40 cmH2O/L/sec was used for one of the runs of the Horizon Task to induce somatic anxiety. The other run was completed without breathing resistance. C Boxplots showing the median and quartile values of participants’ self-reported anxiety scores at baseline, during the task run without breathing resistance, and during the task run with breathing resistance. D Horizon Task: Participants first observed outcomes of four forced choices before they were allowed to make either one or six free choices between options to maximize the total number of points received. Games with one or six free choices are referred to as Horizon 1 (H1) and Horizon 6 (H6) games, respectively. The forced choices in each game were either equally informative (two forced choices for each slot machine) or unequally informative (three forced choices for one slot machine and one for the other), creating differences in uncertainty about the average reward value of each option. The letters “XX” were displayed in place of point values on the unchosen side.

After this sensitivity protocol, participants completed neuropsychological testing and additional questionnaires as part of the larger study protocol. This ensured participants were able to return to a baseline arousal state before performing the Horizon Task. Participants were then re-fit with the mask before task performance and indicated their baseline level of anxiety (using both the self-report item mentioned above and the State-Trait Anxiety Inventory [STAI] State scale32). Next, they completed two runs of the Horizon Task, where one of the runs included a breathing resistance of 40 cmH2O/L/sec (counterbalanced order across participants). After each run, they again completed the STAI-State scale and indicated their self-reported anxiety during task performance.

Horizon task

As in previous studies7,9, the Horizon Task here consisted of 80 games in which participants chose between two slot machines with different (unknown) average payout values (see Fig. 1 for a depiction of the task). For one of the slot machines, results were sampled from a Gaussian distribution with a mean of either 40 or 60 and a fixed standard deviation of 8. For the other slot machine, the distribution was shifted 4, 8, 12, 20, or 30 points in either direction from the first slot machine.

Participants first observed outcomes of four forced choices before they were allowed to make either one or six free choices between options to maximize the total number of points received. Games with one or six free choices are referred to as Horizon 1 (H1) and Horizon 6 (H6) games, respectively. The forced choices in each game were either equally informative (two forced choices for each slot machine) or unequally informative (three forced choices for one slot machine and one for the other). The different information conditions, decision horizons, and mean slot machine values were all counterbalanced throughout the task.

This task structure is therefore designed to test how choices are influenced by differences in available information, the usefulness of gaining information, and differences in expected reward. Here, a greater propensity to choose the more uncertain option in unequal information trials (i.e., choosing the slot machine that was only chosen once during forced choices) reflects an information bonus, while the propensity to choose the option with lower observed reward reflects decision noise. As described further below, these propensities are assumed to be moderated by learning rates during forced-choice trials and by the expected number of future choices (one vs. six; H1 vs. H6 conditions). Namely, information bonus and decision noise are expected to increase from H1 to H6 – where this increase reflects DE and RE, respectively—because information gained through exploration is only useful for guiding future choices within H6 games.

To minimize potential influences on individual differences in behavior, the observed outcome for each choice was sampled from the underlying Gaussian distributions but fixed across participants and task runs. Thus, two participants who chose the same option on a specific trial always observed the same result. However, after preliminary checking of data in the first four participants (all HCs), unexpected behavior in certain games led us to realize that forced choice outcomes in a few cases were not representative of the underlying distributions, which generated concerns given the number of trials per task condition (i.e., with respect to generative mean differences). To minimize this issue, forced choice results in these cases were re-sampled until they more closely aligned with the true differences between underlying distributions. Any potential effects of task version on behavior were accounted for in subsequent analyses.

Computational model

An established computational model was fit to task behavior (i.e., predicting the first free choice across games), as described in detail by Zajkowski, et al.33. In brief, the probability of choosing the right option was calculated using a logistic choice function that included the difference in expected reward values between options, \(\Delta R\), the information difference between options, \(\Delta I\), a potential bias toward the left vs. right choice, \(B\), and decision noise, \(\sigma\), as follows:

$$p\left({choose} \, {right}\right)=\frac{1}{1+\exp \left(\frac{\Delta R+A\Delta I+B}{\sigma }\right)}$$
(1)

The information difference (\(\Delta I\)) was equal to +1 when one outcome was shown for the left option (as choosing the left option would be more informative), −1 when three outcomes were shown for that option, and 0 when two outcomes were shown for each option. This was then scaled by a free parameter referred to as the information bonus (depicted above as \(A\)). The expected reward value difference (\(\Delta R\)) was calculated using a Rescorla-Wagner update equation, where the expected reward value R for each option i on time step t was updated based on the prediction error between the expected reward \({R}_{t}^{i}\) and observed reward \({r}_{i}\). The learning rate \(\alpha\) varied as a function of uncertainty (i.e., in relation to the number of previous observations):

$${R}_{t+1}^{i}={R}_{t}^{i}+\alpha \left({r}_{i}-{R}_{t}^{i}\right)$$
(2)

The initial learning rate \({\alpha }_{0}\) was a free parameter fit to participant data. For each subsequent choice, the learning rate was updated with the following equation:

$$\frac{1}{{\alpha }_{t}^{i}}=\frac{1}{{\alpha }_{t-1}^{i}+{\alpha }_{d}}+1$$
(3)

The drift term, \({\alpha }_{d}\), influences how learning rate changes over time. It can also be related to an asymptotic learning rate (\({\alpha }_{\infty }\)) with the following equation:

$${\alpha }_{d}=\frac{{({\alpha }_{\infty })}^{2}}{1-{\alpha }_{\infty }}$$
(4)

The \({\alpha }_{\infty }\) term (bound between 0 and 1) is also a free parameter fit to the data (entailing a value for \({\alpha }_{d}\)). This asymptotic learning rate is the value to which learning rate would theoretically converge if the game were played indefinitely (i.e., due to evolving levels of uncertainty after seeing an increasing number of outcomes). Within the Kalman filter model, \({\alpha }_{d}\) reflects the ratio between expected instability (drift) in the underlying reward mean and expected outcome noise around that reward mean. Based on the final equation above, it therefore follows that slower asymptotic learning rates can be seen to reflect an implicit belief that underlying reward means are stable (i.e., minimal drift within a game) and/or that each observed outcome is unreliable (i.e., high levels of outcome noise).

To get parameter estimates for each participant, a hierarchical Bayesian model34 with 12 free parameters in total was fit using a Markov Chain Monte Carlo (MCMC) method implemented with MATJAGS35. The spatial bias (\(B\)) and decision noise (\(\sigma\)) were fit separately for the four combinations of horizon (H1 or H6) and information condition (equal or unequal). The information bonus (\(A\)) was fit separately for the two horizon conditions (i.e., this can only be fit for unequal information games). The initial learning rate and asymptotic learning rate were fit across all games together. In our fitting procedure, second-level hyperparameters defined the prior distributions from which individual parameters were sampled (see Supplementary Table S2 for complete specification of these prior distributions). All participants in both conditions were included under a single second-level prior so that the hyperpriors for HCs and individuals with MUD were equivalent. This was done to prevent any artificial bias toward group or condition differences.

Previous work has shown that estimates of initial reward expectations trade off with information bonus estimates on this task33, as optimistic reward expectations can also promote the choice of unfamiliar options. Thus, given our focus on directed exploration, we fixed the initial reward expectation to a neutral value of 50 to ensure reliable parameter estimates for the information bonuses. We assessed the recoverability of model parameters by measuring the correlation between parameters used to simulate data and parameters estimated from fitting that simulated data (see Supplementary Methods and Supplementary Fig. S2 for details).

Measures

Participants were asked to indicate the sex they were assigned at birth. Next, they completed the following measures to assess relevant clinical symptoms as well as trait and state psychological characteristics.

Symptom severity

To measure symptom severity in the MUD group, we used the Drug Abuse Screening Test (DAST), the Methamphetamine Withdrawal Questionnaire (MAWQ), and the Desire for Speed (Methamphetamine) Questionnaire (DSQ) mentioned above. DAST measures overall drug abuse severity and interference with life functioning36; MAWQ measures withdrawal symptoms37; and DSQ measures current craving levels29. These measures were only gathered in the MUD group.

To measure comorbid symptom dimensions associated with MUD, we used questionnaires measuring depression, anxiety, and impulsivity. Overall depressive symptoms were measured using the Patient Health Questionnaire (PHQ-9)38. State and trait anxiety were measured with the State-Trait Anxiety Inventory (STAI-State/Trait)32,39. Impulsivity was measured with the Urgency-Premeditation-Perseverance-Sensation Seeking-Positive Urgency (UPPS-P) Impulsive Behavior Scale Total Score40,41.

Cognitive reflectiveness

The Cognitive Reflection Test (CRT-7)42 measures the tendency to “stop and think” before immediately trusting one’s intuition. The test asks seven short questions designed such that there is an immediately intuitive, but incorrect answer, and a correct answer that, while not logically difficult, requires the individual to devote effortful cognitive resources instead of immediately choosing the intuitively appealing response. An example item is “If it takes 5 machines 5 min to make 5 widgets, how long would it take 100 machines to make 100 widgets?” (intuitive incorrect answer: 100 min; correct answer: 5 min).

Working memory

The List Sorting Working Memory Test from the NIH Toolbox Cognition Battery43 was used to assess working memory. In our analyses, we used participants’ t-scores adjusted for age and sex.

Statistical analyses

This study was not pre-registered. Between-subject statistical analyses were carried out in R (version 4.4.1) with R Studio. As explained further below, k-means clustering was performed for \({{{\rm{\alpha }}}}_{0}\) using the kmeans function of the stats package44. Linear mixed-effects models (LMEs) and logistic mixed regressions were run using the lmer function and the glmer function of the lme4 package45. For the logistic models, the bobyqa optimizer was used to estimate the coefficients. In all mixed-effects models, a term for participant ID was included to allow for random intercepts. Data distributions were visually inspected to confirm that assumptions of each statistical test were met. Effect sizes were calculated with the F_to_eta2 function of the effectsize package46. All continuous predictors were mean-centered using the gscale function of the jtools package47. Unless otherwise stated, categorical variables were sum-coded as factors, including group (HCs = −1, MUD = 1), breathing resistance (absent = −1, present = 1), information condition (equal = −1, unequal = 1), horizon (H1 = −1, H6 = 1), sex (male participants = −1, female participants = 1), and task version (main version = −1, initial version in first four participants = 1).

The variables age and sex were included in all models as potential covariates to ensure they did not explain observed effects. As a small number of participants also completed a slightly different version of the task (i.e., with a different sequence of reward values sampled from the underlying generative means; described above), task version was also included in all models as a potential covariate. After controlling for age, sex, and task version, follow-up models were run that additionally included working memory capacity, given that general cognitive ability has previously been shown to positively correlate with performance in the Horizon Task9. As working memory data were missing for a subset of participants (N = 5), its potential explanatory power was only assessed in the subset of participants with available data in these follow-up analyses (as this would otherwise effectively remove data from these five participants from all analyses). When necessary, significant effects were further interpreted using post-hoc contrasts of estimated marginal trends (EMTs) or estimated marginal means (EMMs) using the emmeans package48. All t-tests were two-sided. We report 95% confidence intervals for the regression coefficients estimated in these frequentist analyses.

To evaluate the minimum effect size we could detect when testing the hypothesized effect of group on DE (i.e., in the LME analysis described above), we performed a sensitivity power analysis (using the wp.rmanova function of the WebPower package). Assuming a false positive rate of p < 0.05, this analysis indicated that our sample size of 114 would provide 80% power to detect a medium effect size of \({\eta }_{p}^{2}\) = 0.065.

Protocol validation

To test whether administration of the moderate breathing resistance level (40 cmH2O/L/sec) used during task performance successfully increased anxiety, an LME was run predicting self-reported anxiety during the task, with resistance condition (baseline, task run with resistance, task run without resistance), group, and their interaction as predictors. Identical LMEs were also run using STAI-State scores as the outcome variable in place of self-reported anxiety to confirm consistency.

To further confirm the efficacy of the aversive state induction, we performed another LME to test if administration of the breathing resistance successfully induced anxiety within the pre-task exposure protocol. This model specifically assessed whether self-reported anxiety level was predicted by breathing resistance level (0, 10, 20, 40, 60, and 80 cmH2O/L/sec), group, and/or their interaction.

Computational analyses

Computational measures included: directed exploration (DE), random exploration (RE), initial learning rate (\({{{\rm{\alpha }}}}_{0}\)), and asymptotic learning rate (\({{{\rm{\alpha }}}}_{\infty }\)). DE was calculated by subtracting the information bonus parameter fit to H1 games from that fit to H6 games. This allowed us to measure the degree to which participants became more information-seeking as decision horizon increased (i.e., when information became goal-relevant). Note that this only applied to games in which unequal information was given. RE was calculated by subtracting the decision noise parameter fit to H1 games from that fit to H6 games, allowing us to measure the degree to which participants became less value sensitive in their initial choice as decision horizon increased (i.e., which can also serve as an information-seeking strategy). Analyses of RE were here restricted to games where equal information was given, such that directed information-seeking could not account for any apparent changes in value sensitivity. For analyses with \({{{\rm{\alpha }}}}_{\infty }\) as an outcome variable, we also included \({{{\rm{\alpha }}}}_{0}\) as a covariate, given that those with the highest initial learning rate tended to experience the greatest decrease in learning rate over time (somewhat analogous to regression to the mean). For these variables, potential outliers were identified using an iterative Grubbs’ method (threshold: p < 0.01), implemented with the grubbs.test function from the outliers package49.

To examine potential effects of group and breathing resistance on each of these model parameters, separate LMEs were run predicting each parameter value based on group, resistance condition, and their interaction. Note that, because \({{{\rm{\alpha }}}}_{0}\) values showed a bimodal distribution across participants (see below), we instead performed a k-means clustering analysis and divided participants into those with high and low values, and then used cluster membership as a categorical outcome variable in logistic mixed regressions in place of LMEs. If extreme values of model parameters were identified (tested within each resistance condition separately), analyses were repeated with those data removed, and any discrepancies between results with/without outliers were reported.

To assess whether observed differences in DE and RE were better explained by differences in H1 or H6, separate models were also run with information bonus or decision noise as the outcome variable, respectively, including group, horizon, and their interaction as predictors. Note that the strength of the interaction between horizon and group can here also be seen as a test of group differences in DE or RE. As a complementary approach, Bayesian analyses were also run using the brm function within the brms package (using default [flat] prior settings)50. These models had the same structure as our frequentist analyses, but additionally incorporated the posterior variance of each parameter estimate for each participant. Bayesian 95% Credible Intervals (BCIs) and Bayes Factors (BFs) were used to compare the likelihood of a model including the interaction between horizon and group to a model omitting that interaction. The full model was then compared to an extended model that additionally included a three-way interaction between breathing resistance, horizon, and group. This was used to evaluate whether there was evidence for a group difference in how the breathing resistance influenced DE (i.e., whether group membership moderated the interaction between resistance and horizon).

For the learning rate parameters, we conducted a further Bayesian analysis to test for an effect of group by comparing a full model including group, breathing resistance, and their interaction to a reduced model with only breathing resistance. To evaluate the interaction between group and breathing resistance, we also compared the full model to a reduced model that included only the main effects of group and breathing resistance. This allowed us to determine whether group differences in the learning rate parameters were independent of breathing resistance.

To better interpret our main results, we also carried out a test of group differences in avoidance of uncertainty by calculating the (frequency-based) probability of choosing the option with the greater observed mean when it was also the option with greater uncertainty (i.e., with only one forced-choice outcome shown). This was done for the first free choice on H1 and H6 games separately. We then tested an LME predicting this probability for H6 games based on group and resistance condition. The corresponding probability for H1 games was also included in the model to account for any baseline tendencies to approach or avoid uncertainty when exploration would not be helpful. An analogous LME was also run to test if negative outcome avoidance differed between groups. This model instead predicted the probability of choosing the option with the greater observed reward mean when it was also the low-uncertainty option (i.e., the one with three observed forced-choice outcomes) in H6 games based on group and resistance condition, while again controlling for baseline tendencies in H1. Here, we reasoned that those with greater negative outcome avoidance would more often choose the option with the higher observed value, despite the greater information gain afforded by the other option.

Finally, we tested if patterns of behavior captured by the model were related to individual differences in somatic anxiety, substance use symptoms, and/or measures of affective psychopathology. To do so, we first ran LMEs predicting each computational parameter value based on self-reported anxiety level during the task. Among other covariates, the effects of group, resistance condition, and their interaction were also included, as well as the interaction between resistance condition and self-reported anxiety. We next tested LMEs within the MUD group predicting each parameter value based on DAST, DSQ, or MAWQ scores (tested separately), accounting for the effect of breathing resistance. We were similarly interested in examining whether the relationship between parameter values and drug-related symptoms differed between resistance conditions, so these interactions were also included in the models. We then tested LMEs within the MUD group predicting each parameter value based on the presence of other comorbid substance use disorders separately (present/absent), accounting for any effect of breathing resistance. Analogous models were also run that included continuous measures of affective psychopathology (i.e., PHQ-9, STAI-Trait, and UPPS-P) as predictors. To determine if medication status (medicated/unmedicated), time since last use of methamphetamine, and/or time since starting treatment might influence model parameters, similar LMEs were run with those variables separately.

Model-based predictors of choice accuracy

To evaluate the effect of experimental condition on task performance, we tested an LME predicting first free choice accuracy based on horizon, information condition, group, breathing resistance, and the three-way interactions of horizon, group, and breathing resistance, as well as horizon, group, and information condition (including respective two-way interactions). We additionally tested if these variables predicted accuracy across the free choices of the H6 condition. This model included choice number (1–6), group, information condition, and breathing resistance, as well as the three-way interaction of choice number, group and breathing resistance as well as choice number, group, and information condition (including respective two-way interactions). To examine how model parameters influenced subsequent task performance in H6 trials (i.e., to interpret whether some values might be considered more optimal than others), we also examined if accuracy was predicted by each of the model parameters, free choice number (2–6; i.e., excluding the first free choice to which these parameters were directly fit), resistance, and/or group, and whether a given model parameter moderated the improvement in accuracy as choice number increased.

Model parameters as predictors of cognitive reflectiveness

We also sought to replicate prior results9, and extend them to individuals with MUD, linking computational Horizon Task metrics (DE in particular) to cognitive reflectiveness (i.e., CRT scores). We therefore tested if model parameters could be predicted by the number of correct answers on this measure (accounting for effects of resistance). Group was included as a covariate to ensure that observed effects were not explained by group differences in cognitive reflectiveness or parameter values (see Table 1). If CRT scores significantly predicted a given model parameter, we subsequently tested if CRT scores mediated group differences in that parameter. The mediate function within the mediation package was used to test for these effects using a nonparametric bootstrapping approach that included 5,000 Monte Carlo simulations. Possible effects of breathing resistance were also incorporated into these analyses. After testing for potential mediation effects, we further determined if any observed relationships were specific to the MUD population by testing LMEs predicting computational parameter values based on CRT scores.

Table 1 Descriptive characteristics for HCs and individuals with MUD

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

To acquire a comprehensive clinical phenotype, participants completed several cognitive and clinical scales. Compared to HCs, the MUD group showed elevated symptoms of anxiety and depression, higher impulsivity, reduced cognitive performance (working memory), and lower cognitive reflectiveness (see Table 1).

The aversive state induction successfully increased anxiety during task performance across all participants

In the LME predicting self-reported anxiety based on group, resistance condition (baseline, task run without resistance, task run with resistance), and their interaction, all effects were significant (Group: [F(1,109.0) = 30.20, p < 0.001, \({\eta }_{p}^{2}\) = 0.22, b = 1.03, CI = (0.281, 1.781)]; Resistance Condition: [F(2,224.0) = 82.58, p < 0.001, \({\eta }_{p}^{2}\) = 0.42, b(No Resistance) = 0.155 (−0.316, 0.626), b(Resistance) = 1.345 (0.874, 1.816)]; Interaction: [F(2,224.0) = 10.43, p < 0.001, \({\eta }_{p}^{2}\) = 0.09, b(MUD x No Resistance) = 0.738 (0.065, 1.410), b(MUD x Resistance) = 1.566 (0.894, 2.238)]), indicating the breathing resistance successfully induced anxiety during the task and this effect was magnified for individuals with MUD (see Fig. 1). These effects were also observed for the analogous LME predicting STAI-State anxiety (Group: [F(1,109.0) = 14.88, p < .001, \({\eta }_{p}^{2}\) = 0.12, b = 3.98, CI = (0.011, 7.950)]; Resistance Condition: [F(2,224.0) = 113.48, p < 0.001, \({\eta }_{p}^{2}\) = 0.50, b(No Resistance) = 1.948 (−0.214, 4.111), b(Resistance) = 9.138 (6.975, 11.300)]; Interaction: [F(2,224.0) = 5.73, p = 0.004, \({\eta }_{p}^{2}\) = 0.05, b(MUD x No Resistance) = 3.909 (0.823, 6.994), b(MUD x Resistance) = 5.094 (2.009, 8.180)]) and in the LME regressing self-reported anxiety based on continuous levels of resistance, group, and their interaction (Resistance Condition: [F(1,564.5) = 389.63, p < 0.001, \({\eta }_{p}^{2}\) = 0.41, b = 0.041, CI = (0.037, 0.045)]; Group: [F(1,109.1) = 12.41, p < 0.001, \({\eta }_{p}^{2}\) = 0.10, b = 0.72, CI = (0.323, 1.123)]; Interaction: [F(1,564.5) = 22.72, p < 0.001, \({\eta }_{p}^{2}\) = 0.04, b = 0.01, CI = (0.006, 0.014)]). All effects described above remained significant when working memory was included as a predictor in the models (see Supplemental Results for details).

Individuals with methamphetamine use disorder show lower task performance than healthy comparisons

As an initial assessment of task performance, we tested an LME predicting first free choice accuracy (i.e., choice of the option with the higher average reward value) based on the relevant task and experimental conditions. In brief, accuracy was higher in H1 than H6 games (F(1,788.0) = 137.10, p < 0.001, \({\eta }_{p}^{2}\) = 0.15, b = −0.04, CI = [−0.043, −0.031]), higher in the equal information condition than the unequal information condition (F(1,788.0) = 89.32, p < 0.001, \({\eta }_{p}^{2}\) = 0.10, b = −0.03, CI = [−0.036, −0.024]), and individuals with MUD showed lower accuracy than HCs overall (F(1,109.0) = 31.10, p < 0.001, \({\eta }_{p}^{2}\) = 0.22, b = −0.07, CI = [−0.092, −0.045]). Further, the change in accuracy between H1 and H6 games was less in individuals with MUD than HCs (F(1,788.0) = 8.14, p = 0.004, \({\eta }_{p}^{2}\) = 0.01, b = 0.01, CI = [0.003, 0.015]), consistent with less exploration in the MUD group. These effects remained significant when working memory was included as a predictor in the model (see Supplementary Table S6 for full model results and Supplementary Table S7 for descriptive information regarding task accuracy in each of the experimental conditions).

To confirm expected improvements in accuracy over time, and potential modulation of this effect by group or anxiety induction, we tested a subsequent LME predicting accuracy on the six free choices of H6 games based on group, information condition, choice number (1–6), breathing resistance, and the three-way interactions of choice number, group, and breathing resistance, as well as between choice number, group, and information condition (including respective two-way interactions). We observed that accuracy was again higher in HCs (estimated marginal mean [EMM] = 0.81) than in individuals with MUD (EMM = 0.69; contrast[c]= 0.126, t(109) = 5.42, p < 0.001), higher in the equal than unequal information condition (EMM[equal] = 0.77, EMM[unequal] = 0.73, c = 0.040, t(2612) = 10.73, p < 0.001), and increased as a function of choice number (see Table 2). There was also a significant interaction between group and resistance (F(1,2612.0) = 4.15, p = 0.042), reflecting a numerical increase in accuracy with the breathing resistance in HCs (EMM[Resistance] = 0.82, EMM[No Resistance] = 0.81, c = 0.008, t(2612) = 1.52, p = 0.129) and a numerical decrease in accuracy in those with MUD (EMM[Resistance] = 0.68, EMM[No Resistance] = 0.69, c = 0.007, t(2612) = −1.36, p = 0.172; see Fig. 2). The effects of choice number (F(1,2497.0) = 116.85, p < 0.001, \({\eta }_{p}^{2}\) = 0.04, b = 0.012, CI = [0.010, 0.014]), group (F(1,103.0) = 18.24, p < 0.001, \({\eta }_{p}^{2}\) = 0.15, b = −0.054, CI = [−0.079, −0.029]), information condition (F(1,2497.0) = 86.22, p < 0.001, \({\eta }_{p}^{2}\) = 0.03, b = −0.018, CI = [−0.022, −0.014]), and the interaction between group and resistance (F(1,2497.0) = 4.60, p = 0.032, \({\eta }_{p}^{2}\) < 0.01, b = −0.004, CI = [−0.008, 0.0004]) remained significant when controlling for working memory in the subset of participants for which these scores were available.

Table 2 LME results predicting accuracy across H6 free choice trials
Fig. 2: Choice accuracy on the horizon task.
figure 2

Top: H1 and H6 accuracy for each choice by group, resistance, and information condition. Error bars show 95% confidence intervals for accuracy at each choice number. As expected, accuracy was lower in H6 than H1 games for the first free choice (i.e., reflecting random exploration) and improved with further choices in H6 games. The change in first free choice accuracy between H1 and H6 games was also greater in HCs than in individuals with MUD, consistent with greater exploration in HCs. Accuracy was greater in the equal information games than in the unequal information games. Bottom: Accuracy in the H1 and H6 conditions for each group (i.e., collapsed across choice numbers), separated by resistance and information condition. Accuracy was higher in HCs than in individuals with MUD. Breathing resistance increased accuracy for HCs, but decreased accuracy for individuals with MUD in H6.

Individuals with methamphetamine use disorder show less directed exploration, random exploration, and slower learning rates than healthy comparisons

The intercorrelations between each of the fitted parameter values were low (see Supplementary Fig. S1), suggesting that each parameter explained independent aspects of participant behavior. Further, parameter recoverability was sufficient, indicated by moderate-to-high correlations between parameter values used to simulate behavior and corresponding parameter values estimated from that simulated behavior (see Supplementary Fig. S2). Descriptive information regarding the computational parameter values for each group and experimental condition is provided in Supplementary Table S8.

In our primary computational analyses, LMEs were used to predict model parameter values based on group, resistance level, and their interaction (see Table 3). Results showed higher values in HCs for DE (EMM[HCs] = 6.06, EMM[MUD] = 4.58, c = 1.48, t(109) = 2.33, p = 0.021), RE (EMM[HCs] = 1.67, EMM[MUD] = 1.04, c = 0.63, t(109) = 1.98, p = 0.050), \({{{\rm{\alpha }}}}_{0}\) (initial learning rate), and \({{{\rm{\alpha }}}}_{\infty }\) (asymptotic learning rate; i.e., the value to which a participant’s learning rate would theoretically converge if the game were played indefinitely; EMM[HCs] = 0.30, EMM[MUD] = 0.22, c = 0.077, t(118.3) = 3.06, p = 0.003). Notably, the group difference in RE was no longer significant after two potential outliers were removed using a Grubbs’ test (F(1,110.0) = 1.59, p = 0.209, \({\eta }_{p}^{2}\) = 0.01), indicating high values that fell sufficiently outside the overall sample distribution. In logistic mixed regressions predicting \({{{\rm{\alpha }}}}_{0}\) (see Fig. 3 for a visualization of the bimodality of the distribution), we observed a significant group difference (proportion in high-value cluster: HCs = 0.67; MUD = 0.32; see Table 3 for statistical results). There was no main effect of breathing resistance or interaction with group for any parameter. Complementary Bayesian models predicting each learning rate based on group, breathing resistance, and their interaction allowed incorporation of information regarding posterior variances around each parameter estimate. These analyses provided further evidence for lower values of \({{{\rm{\alpha }}}}_{0}\) (b = −0.15, BCI = [−0.22, −0.09], BF10 = 51.09) and \({{{\rm{\alpha }}}}_{\infty }\) (b = −0.06, BCI = [−0.09, −0.04], BF10 = 14.48) in individuals with MUD. Notably, however, when \({{{\rm{\alpha }}}}_{0}\) was included as a covariate in the model predicting \({{{\rm{\alpha }}}}_{\infty }\), we did not find evidence for group differences (b = −0.03, BCI = [−0.06, −0.01], BF10 = 0.04). There was also less evidence for models including an interaction between group and resistance condition in predicting \({{{\rm{\alpha }}}}_{0}\) (b = −0.002, BCI = [−0.01, 0.01], BF10 = 0.01) and \({{{\rm{\alpha }}}}_{\infty }\) (b = 0.004, BCI = [−0.01, 0.02], BF10 = 0.02), supporting the absence of an effect of breathing resistance in either group.

Table 3 Results testing effects of group and resistance on primary computational measures
Fig. 3: Group differences in exploration and learning on the horizon task.
figure 3

Plots depict participants’ computational parameter estimates for the Horizon Task model, where parameter values in task runs with and without the anxiety induction (breathing resistance) for each participant are connected using thin lines. Thick lines and surrounding confidence ribbons represent the mean and standard error for parameter values in each group. Results indicate that the MUD group showed lower levels of directed exploration, random exploration, and learning rates compared to HCs, both with and without the breathing resistance manipulation.

When working memory was included as an additional covariate in the subset of participants with available data, effects of group in the LMEs above remained significant for all parameters, except in relation to DE, which became marginal (F(1,103.0) = 3.09, p = 0.082, \({\eta }_{p}^{2}\) = 0.03, b = −0.614, CI = [−1.298, 0.071]; see Supplementary Table S9). This suggested poor working memory might contribute to reduced DE in those with MUD. However, working memory was not itself a significant predictor in the model predicting DE (F(1,103.0) = 0.70, p = 0.404, \({\eta }_{p}^{2}\) < 0.01, b = 0.027, CI = [−0.036, 0.090]), RE (F(1,103.0) = 0.97, p = 0.326, \({\eta }_{p}^{2}\) < 0.01, b = −0.016, CI = [−0.048, 0.016]), \({{{\rm{\alpha }}}}_{0}\) (\({\chi }^{2}\)(1) = 1.85, p = 0.174, b = 0.050, CI = [−0.022, 0.122]), or \({{{\rm{\alpha }}}}_{\infty }\) (F(1,102.8) = 0.03, p = 0.869, \({\eta }_{p}^{2}\) < 0.01, b = −0.0002, CI = [−0.003, 0.002]).

To better interpret these group differences, effects on parameters in H1 and H6 were then examined separately. In an LME predicting the information bonus parameter by horizon, group, and their interaction, the interaction was significant (F(1,340.0) = 12.60, p < 0.001, \({\eta }_{p}^{2}\) = 0.04, b = −0.383, CI = [−0.595, −0.172]), reflecting the group difference in DE described above (see Fig. 3). Post-hoc contrasts showed that HCs had higher information bonus values than individuals with MUD in H6 (HCs–MUD = 1.66, t(139.1) = 2.63, p = 0.010), but not in H1 (HCs–MUD = 0.13, t(139.1) = 0.20, p = 0.843). Notably, this interaction remained significant when accounting for working memory (F(1,325.0) = 10.72, p < 0.001, \({\eta }_{p}^{2}\) = 0.03, b = −0.360, CI = [−0.575, −0.144]), supporting an independent group difference in DE. Complementary Bayesian analyses incorporating the posterior variance around information bonus estimates provided further evidence for this interaction between group and horizon (b = −0.38, BCI = [−0.63, −0.14], BF10 = 38.78). These analyses also confirmed lower evidence for the model with a three-way interaction between group, horizon, and breathing resistance (b = 0.06, BCI = [−0.18, 0.30], BF10 = 0.32), supporting the absence of an effect of breathing resistance in either group.

In an analogous LME predicting decision noise based on horizon, group, and their interaction, the interaction was marginal (F(1,340.0) = 3.68, p = 0.056, \({\eta }_{p}^{2}\) = 0.01, b = −0.117, CI = [−0.237, 0.003]). However, this interaction became significant when accounting for working memory in the subset of participants with available data (F(1,325.0) = 4.13, p = 0.043, \({\eta }_{p}^{2}\) = 0.01, b = −0.129, CI = [−0.253, −0.005]). Post-hoc contrasts here suggested that group differences in RE were driven by marginally greater decision noise in those with MUD than HCs in H1 (HCs–MUD = −0.40, t(212.9) = −1.81, p = 0.072), with no difference in H6 (HCs–MUD = 0.12, t(212.9) = 0.53, p = 0.598; see Fig. 3). However, complementary Bayesian analyses did not provide support for this interaction (b = −0.02, BCI = [−0.15, 0.11], BF10 = 0.18). There was again less evidence for a model with the three-way interaction between group, horizon, and breathing resistance (b = −0.02, BCI = [−0.15, 0.11], BF10 = 0.002).

Individuals with methamphetamine use disorder showed greater avoidance of uncertainty

As a complementary test of uncertainty avoidance, we calculated the probability of choosing the option with the greater observed mean reward when it was also the option with greater uncertainty (i.e., only one forced choice outcome shown) on the first free choice for H1 and H6 games separately. Here, we reasoned that greater uncertainty avoidance would be reflected by fewer choices of the uncertain option despite the greater observed reward. To evaluate this, we tested an LME predicting this probability for H6 games based on group and resistance condition. The corresponding probability for H1 games was also included in the model to account for any baseline tendency to approach or avoid uncertainty when exploration would not be helpful. This LME revealed a main effect of group, such that individuals with MUD had a lower tendency to choose the option with a greater mean reward value when this option had greater uncertainty (EMM[HCs] = 0.82, EMM[MUD] = 0.74, F(1,117.2) = 6.47, p = 0.012, b = −0.039, CI = [−0.069, −0.009]). However, the effect of group became marginal when working memory was included in the model (F(1,108.2) = 3.80, p = 0.054, b = −0.032, CI = [−0.064, 0.001]). An analogous LME was run to test if negative outcome avoidance differed between groups. This model instead predicted the probability of choosing the option with the greater observed mean reward when it was also the option with lower uncertainty in H6 games based on group and resistance condition, while again controlling for baseline tendencies in H1. Here, we reasoned that those with greater negative outcome avoidance would more often choose the option with the higher observed reward value, despite the greater information gain afforded by the other option. In this case, there was no effect of group (EMM[HCs] = 0.67, EMM[MUD] = 0.66, F(1,112.3) = 0.02, p = 0.895, b = −0.002, CI = [−0.039, 0.034]), and Bayesian analyses indicated the data provided strong evidence for the absence of this effect (b = 0.004, BCI = [−0.025, 0.033], BF10 = 0.038). This suggested HCs and individuals with MUD primarily differed in their avoidance of uncertainty rather than avoidance of negative outcomes.

No relationships were observed between computational parameter values and state anxiety, substance use symptoms, or measures of affective psychopathology

To determine if somatic anxiety related to the patterns of behavior captured by the model, we estimated LMEs predicting each computational parameter value based on self-reported anxiety during the task. This tested whether those with greater state anxiety also had greater changes in behavior. Among other covariates, effects of group and resistance condition were also included, as well as the interaction between resistance condition and self-reported anxiety. This interaction captured the possibility that those who were more sensitive to the somatic manipulation (i.e., showed greater increases in anxiety) also showed greater changes in exploratory behavior or learning. In these models, we did not observe any effects of self-reported anxiety in predicting DE (F(1,219.0) = 0.26, p = 0.611, \({\eta }_{p}^{2}\) < 0.01, b = −0.117, CI = [−0.345, 0.111]), RE (F(1,192.6) = 0.11, p = 0.740, \({\eta }_{p}^{2}\) < 0.01, b = −0.014, CI = [−0.149, 0.121]), \({\alpha }_{0}\) (\({\chi }^{2}\)(1) = 0.07, p = 0.790, b = −0.100, CI = [−0.421, 0.222]), or \({\alpha }_{\infty }\) (F(1,211.6) = 1.24, p = 0.266, \({\eta }_{p}^{2}\) < 0.01, b = 0.006, CI = [−0.004, 0.015]). Nor did we observe a significant interaction between self-reported anxiety and resistance condition in predicting DE (F(1,130.7) = 2.49, p = 0.117, \({\eta }_{p}^{2}\) = 0.02, b = 0.130, CI = [−0.031, 0.291]), RE (F(1,147.2) = 0.13, p = 0.720, \({\eta }_{p}^{2}\) < 0.01, b = −0.021, CI = [−0.136, 0.094]), \({\alpha }_{0}\) (\({\chi }^{2}\)(1) = 0.98, p = 0.321, b = 0.113, CI = [−0.110, 0.336]), or \({\alpha }_{\infty }\) (F(1,136.9) = 0.13, p = 0.719, \({\eta }_{p}^{2}\) < 0.01, b = −0.001, CI = [−0.008, 0.006]). Thus, we did not find evidence that individual differences in sensitivity to the somatic manipulation were related to exploratory tendencies or other patterns of behavior in the task.

Follow-up LMEs were run predicting model parameters based on substance use symptoms in the MUD group (i.e., DAST, DSQ, MAWQ; tested separately), breathing resistance, and their interaction. In all models, we observed no significant main effects of substance use symptoms nor interaction effects (see Supplementary Tables S10S12). However, in the models predicting \({{{\rm{\alpha }}}}_{\infty }\) based on MAWQ and DAST (separately), we observed significant effects of breathing resistance when working memory was included as a covariate (see Supplementary Tables S11, S12). Considering the absence of an effect of breathing resistance in other computational analyses, and that these results were not hypothesized, we simply note them here for the interested reader and for purposes of future hypothesis generation.

LMEs testing potential parameter differences within individuals with MUD based on continuous measures of psychopathology (PHQ-9, UPPS-P Total, and STAI-Trait Scores; tested separately), medication status (medicated/unmedicated), time since starting treatment, time since last methamphetamine use, and comorbid psychopathology (present/absent) did not show any significant effects (see Supplementary Table S13 for full statistical results).

Greater random exploration and faster learning rates each improved performance

To evaluate the theoretical significance of observed group differences, we tested whether some values for each model parameter might be considered more optimal than others with respect to task performance after the first free choice in H6 (see Supplementary Tables S14, S15 for full statistical results). In brief, LMEs revealed that higher values of \({\alpha }_{0}\) and \({\alpha }_{\infty }\) each predicted greater accuracy in the task. Higher values of H6 decision noise predicted steeper improvements in accuracy over time for equal information games, while higher values of \({\alpha }_{0}\) predicted steeper improvements in unequal information games. We did not observe a relationship between H6 information bonus values and accuracy in unequal information games (F(1,758.2) = 1.72, p = 0.191, \({\eta }_{p}^{2}\) < 0.01, b = 0.002, CI = [−0.001, 0.005]); nor did we observe that H6 information bonus values related to greater improvements over time, although the latter result was trending in the expected direction; F(1,977.1) = 3.27, p = 0.071, \({\eta }_{p}^{2}\) < 0.01, b = 0.001, CI = [−0.000, 0.002].

Directed exploration and learning rates were predicted by cognitive reflectiveness

As a secondary aim, we sought to replicate and extend our prior results linking exploration to cognitive reflectiveness9. To do so, we tested LMEs predicting model parameters based on Cognitive Reflection Test (CRT) scores, accounting for potential effects of group and resistance. Across all participants, CRT score significantly predicted DE (F(1,107.0) = 3.95, p = 0.050, \({\eta }_{p}^{2}\) = 0.04, b = 0.344, CI = [0.001, 0.683]) and \({\alpha }_{0}\) (\({\chi }^{2}\)(1) = 7.04, p = 0.008, b = 0.699, CI = [0.182, 1.215]), but not \({\alpha }_{\infty }\) (F(1,109.7) = 0.13, p = 0.719, \({\eta }_{p}^{2}\) < 0.01, b = 0.002, CI = [−0.011, 0.015]) or RE (F(1,107.0) = 0.06, p = 0.811, \({\eta }_{p}^{2}\) < 0.01, b = 0.021, CI = [−0.152, 0.195]). Notably, when additionally controlling for working memory, CRT remained a significant predictor of \({\alpha }_{0}\) (\({\chi }^{2}\)(1) = 7.24, p = 0.007, b = 0.683, CI = [0.173, 1.104]), while the result became marginal for DE (F(1,101.0) = 3.81, p = 0.054, \({\eta }_{p}^{2}\) = 0.04, b = 0.347, CI = [−0.001, 0.696]).

For analogous models restricted to the MUD sample, CRT score did not significantly predict DE (F(1,51.0) = 0.23, p = 0.635, \({\eta }_{p}^{2}\) < 0.01, b = 0.150, CI = [−0.465, 0.765]), RE (F(1,51.0) = 0.48, p = 0.493, \({\eta }_{p}^{2}\) < 0.01, b = 0.114, CI = [−0.210, 0.438]), or \({\alpha }_{\infty }\) (F(1,51.4) = 1.77, p = 0.189, \({\eta }_{p}^{2}\) = 0.03, b = 0.018, CI = [−0.008, 0.045]); however, the effect was marginal for \({\alpha }_{0}\) (F(1,51.0) = 3.68, p = 0.061, \({\eta }_{p}^{2}\) = 0.07, b = 0.066, CI = [−0.003, 0.134]). Note that, unlike in the full sample, in the MUD group alone the distribution of \({\alpha }_{0}\) values was sufficiently normal to use an LME in place of logistic regression.

Cognitive reflectiveness accounts for group differences in directed exploration and learning rates

The group difference in computational parameters observed above motivated us to test mediation models in which lower values of DE and \({{{\rm{\alpha }}}}_{0}\) in individuals with MUD might be explained by lower reflectiveness (CRT scores). These mediation models included group as the predictor variable and either DE or \({{{\rm{\alpha }}}}_{0}\) as the outcome variable, with CRT score as the potential mediator. In the mediation model predicting DE (total effect c = −0.73, p = 0.024, CI = [−1.36, −0.10]), we observed a significant indirect effect (ab = −0.50, p = 0.045, CI = [−1.02, −0.01]), and a non-significant direct effect (c = −0.23, p = 0.567, CI = [−1.02, 0.54]), supporting CRT as a mediator. In the mediation model for \({\alpha }_{0}\) (total effect c = −0.14, p < 0.001, CI = [−0.20, −0.09]), we also observed a significant indirect effect (ab = −0.062, p = 0.004, CI = [−0.11, −0.02]), and a significant direct effect (c = −0.083, p = 0.014, CI = [−0.15, −0.02]), supporting CRT as a partial mediator. This suggested that lower levels of reflectiveness on uncertainty in individuals with MUD may contribute to the lower levels of DE and slower learning rates observed in this group.

Discussion

In the present study, we compared how treatment-seeking (currently abstinent) individuals with methamphetamine use disorder (MUD) and healthy comparisons (HCs) differed in information-seeking and learning under uncertainty, both with and without a somatic anxiety induction. This allowed us to distinguish the causal effect of state anxiety from potential effects of other factors linked to psychopathology. As expected, we found that HCs outperformed individuals with MUD on the task. Computational modeling revealed that individuals with MUD had lower values of directed exploration (DE), random exploration (RE), initial learning rate (\({\alpha }_{0}\)), and asymptotic learning rate (\({\alpha }_{\infty }\); controlling for \({\alpha }_{0}\)), while these parameters themselves were only weakly correlated.

The differences observed in DE and RE support previous research finding that individuals with substance use problems exhibit reduced exploration51,52. Importantly, however, unlike several previous studies, the Horizon Task allowed us to distinguish directed from random strategies, where measures of DE and RE in this task are also sensitive to beneficial vs. suboptimal engagement in exploratory behavior (i.e., with vs. without future choices that could benefit from information gain). Here, DE differences in individuals with MUD appeared to reflect an attenuated ability to increase exploration when it was beneficial (i.e., in games with a longer horizon) and a greater motivation to avoid uncertain options. In contrast, there was not clear evidence for this pattern of change in exploration between horizon conditions with respect to RE. Group differences in RE were also no longer significant after potential outliers were removed and were not supported by complementary Bayesian analyses; these results should therefore be treated with caution. Overall, these findings offer insights into more specific cognitive mechanisms that might contribute to maladaptive choice and withdrawal avoidance.

The differences observed in initial and asymptotic learning rates are largely consistent with previous literature (reviewed in ref. 53) and suggest individuals with MUD update their beliefs more slowly after observing each new outcome, representing a possible overconfidence in prior beliefs. This could be taken to reflect a greater expectation that mean reward values would remain stable. Alternatively, it could indicate the belief that each observed outcome would be less informative (i.e., more noisy, less trustworthy) with respect to the true value of the underlying reward mean54. In real-world contexts, slower learning rates can prevent individuals from changing their behavior, despite experiencing harmful consequences. However, it remains to be shown whether such results generalize to learning in daily life in this population.

Contrary to our hypothesis, we did not observe significant effects of the somatic anxiety induction on any computational measure (although see ref. 28 for interactions observed in comparisons to an affective disorders sample). Here, it is notable that, while previous research has shown negative correlations between trait anxiety and DE8,9, this study causally manipulated somatic state anxiety to differentiate its influence from trait factors in individuals with MUD. While this could indicate that state anxiety does not account for differences in information-seeking, it is possible that the resistance level, which was chosen to maintain tolerability, did not induce sufficiently high anxiety. Self-reported anxiety was higher for the task run with resistance (Mean=3.05) than without resistance (Mean = 1.46), and this effect was greater for individuals with MUD than HCs, but anxiety scores were still well below the maximum score of 10. Future work might therefore aim to induce higher levels of somatic anxiety in a feasible manner and reassess its potential effects. Incorporating trait measures of interoceptive awareness could also help clarify whether the relationship between somatic anxiety and decision-making is altered in those with higher sensitivity. On the other hand, our results do appear consistent with some prior work showing no change in model-based planning after anxiety induction55. To the extent that DE depends on model-based processes, these results could point in a similar direction.

In line with our secondary aim, results also successfully replicated prior findings9 linking cognitive reflectiveness to DE, and also showed a further association with initial learning rate. Mediation analyses also suggested group differences in DE and initial learning rate might be accounted for by differences in cognitive reflectiveness. This suggests that lower reflectiveness may reduce adaptive information-seeking in individuals with MUD and interfere with learning in uncertain environments. It should be stressed, however, that these analyses on trait reflectiveness were cross-sectional and do not support causal inference. They simply highlight shared explanatory variance between these measures that could offer additional insights. We also note that the relationship between cognitive reflectiveness and model parameters was not observed in the MUD group alone. This could be due to insufficient sample size, the lower values and restricted range of reflectiveness scores in the MUD group, or perhaps a mechanism whereby substance use decouples these variables. Future work should examine whether improving reflectiveness could promote more adaptive information seeking and learning, and whether this might be clinically beneficial. This possibility is supported by previous work showing that cognitive reflectiveness can be improved with training25,26.

Limitations

It is important to consider limitations of the present study when interpreting these results. First, our sample size was only moderate and did not enable us to reliably detect small effect sizes. We also could not determine whether observed group differences represent a pre-existing vulnerability factor or a consequence of methamphetamine use. No relationships were found with length of abstinence, days since starting treatment, or medication status, perhaps suggesting that group differences were better explained by pre-existing factors or were insensitive to recovery; however, longer recovery times will need to be examined. The presence of affective symptoms or other comorbid substance use disorders also did not appear to account for any results.

With these limitations in mind, we found that individuals with MUD exhibited lower levels of exploration and reduced learning rates when making decisions under uncertainty. Contrary to expectation, we did not observe an effect of aversive interoceptive state induction (and resulting increases in somatic anxiety) on model parameters or other behavioral metrics, suggesting trait factors may be of more central importance. Overall, these results highlight directed exploration and learning rates, and underlying uncertainty estimation processes, as possible mechanisms of maladaptive choice in individuals with MUD and may point to specific treatment targets that could be tested in future work.