Main

Two mental processes are essential across all types of addictions: craving, the strong subjective desire for a substance; and decision-making, the objective choices made by affected individuals. While it is collectively acknowledged that craving and decision-making are tightly intertwined in addiction, the two literatures have largely remained in silo, and a computational mechanism linking the two processes is missing. Historically, the cue-reactivity literature—one of the most influential characterizations of craving—emphasizes the elicitation of craving in response to learned addictive cues that serve as secondary drug rewards1,2,3,4,5,6,7. Cue-reactivity paradigms have been widely used to identify the neural correlates of craving (for example, midbrain, insula and cingulate) across a number of addictive disorders and sensory modalities8,9,10,11. Yet they do not provide a mechanistic explanation for how craving arises or interacts with drug-related cues and choice behaviors. For example, forced abstinence and associated removal of drug cues paradoxically leads to increased craving and drug-seeking behaviors in substance-dependent rodents12,13 and humans14,15, a phenomenon termed ā€˜incubation of craving’. Furthermore, it has been shown that drug-related beliefs and expectations also affect craving, an effect independent from the availability of drug rewards and cues16,17,18. As such, despite a rich empirical literature on craving, the computational mechanisms of craving remain elusive.

For the decision-making aspect of addiction, reinforcement learning (RL) has been a primary framework used to account for maladaptive choices, with a central tenet that addictive choices are reinforced by reward prediction errors (RPEs). Preliminary computational models hypothesized that addictive stimuli produce an irreducible RPE signal, subserved by excessive dopamine, that continuously reinforces substance-related choices19, which then subsequently shift one’s homeostatic set points20. While views on heterogeneity of dopaminergic encoding of information have become more nuanced21,22,23, and modern accounts have provided compelling evidence for RL-based behavior in animal models of addiction, these theories have yet to be tested empirically in humans with substance dependence. Critically, they also still do not account for how drug-related choices and craving may mutually influence each other. Recent efforts in computational psychiatry have started to shed light into the interaction between value-based decision-making and subjective states such as mood24,25,26,27,28,29. For example, monetary RPEs were found to predict mood ratings, providing initial evidence that internal subjective states could be influenced by RL signals in a systematic fashion28. Conversely, mood may increase the ā€˜momentum’ of value updating, providing a plausible mechanism for how mood drives dynamic changes in learning24. Momentary craving is probably entangled with addictive decision-making in similar ways, yet only a handful of empirical studies have examined this important relationship16,30, and a computational mechanism linking these two constructs remains noticeably missing despite well-established theoretical and empirical accounts of addiction that connect the two19,31,32,33.

In this study, we tested the hypothesis that momentary substance craving and value-based decision-making shape each other in a bidirectional fashion in humans. To test this hypothesis, we developed a paradigm in which substance-using individuals made choices to obtain either monetary or addictive cue (that is, alcohol or cannabis) outcomes and intermittently self-reported their craving (that is, for alcohol or cannabis) during both blocks. To test the generalizability of our hypothesis, we examined two groups of participants (Fig. 1; total n = 132; see Table 1 for participant characteristics): alcohol drinkers (n = 68) and cannabis users (n = 65). The task consisted of a modified two-armed bandit (Fig. 2a), where participants selected one of two machines (80% reward rate) and saw the outcome of either a coin (in the monetary condition) or their pre-selected addictive cue of either alcohol or cannabis (in the addictive condition). Momentary craving and mood were both sampled during the task (33% and 20% of the trials, respectively), and a computational modeling approach was used to fit both choice and craving data per session. We found that, across both groups, momentary craving biased learning rate in the addictive context, but biased outcome perception in the monetary context. Conversely, in both addictive and monetary contexts, elicited craving was driven by outcomes and expected values (EVs). Finally, we found that computational parameters derived from our models provided added utility (in addition to demographics and model-agnostic metrics) for predicting alcohol, but not cannabis, addiction risk. Together, these results validate a computational mechanism linking momentary craving with value-based decision-making while acknowledging divergence in the computational fingerprints between alcohol- and cannabis-using groups.

Fig. 1: Participant recruitment flow chart.
Fig. 1: Participant recruitment flow chart.
Full size image

All participants were recruited from an online data collection platform called Prolific. We screened 1,000 users for the following characteristics: USA resident, fluent English speaker, high approval rating on Prolific, no previous completion of any of our group’s experiments, balanced on sex. Participants were screened with the World Health Organization’s ASSIST, a questionnaire that allows for screening of use of substances such as alcohol, cannabis, tobacco, stimulants, inhalants and others. We also collected a lab-standard demographics survey to assess basic demographic features such as age, sex, level of education, income and race. From the screening surveys, we recruited cohorts for moderate to high alcohol and cannabis risk scores as identified by ASSIST (>3 for alcohol, >4 for cannabis). From the eligible candidates, we collected data for the slot machine game in 2 separate batches (over 2 days), resulting in 100 alcohol users and 94 cannabis users. Participants with substance use were additionally asked to complete further group-specific questionnaires (ADS and AUDIT for alcohol users, SDS for cannabis users). Finally, data from each group underwent a series of quality control checks. We screened out participants that failed the attention checks during the experiment, had less than 50% optimality of machine selected and minimal to no variability in the rating scores. A total of 68 participants were included in the alcohol group and 65 in the cannabis group.

Fig. 2: Experimental paradigm and model-agnostic task behavior.
Fig. 2: Experimental paradigm and model-agnostic task behavior.
Full size image

a, Participants played a modified two-armed bandit task. Each session started with instructions and a choice of three outcome options, of which the participant selected the one most tempting to them. Following 5 practice trials, 2 blocks of 60 trials were presented, in which the participant selected 1 of 2 machines. In the money block, the outcome presented was an image of a coin. In the addictive cue block, the outcome presented was the option selected by the participant at the start. Craving and mood were intermittently assessed throughout the block (20 craving ratings, 12 mood ratings), scored from 1 to 50. The optimal machine in each trial provided a winning outcome with 80% probability, and the optimal machine switched four times (four reversals) each block. Reversal order was pseudorandomized. Choice optimality was calculated as the proportion of choices that matched the optimal choice at each trial. b, Choice behavior was assessed visually by averaging participant choices across the experiment. Blue lines represent the alcohol group, yellow lines represent the cannabis group, and vertical gray lines represent the timings of reversals in task structure (shaded region represents the 95% confidence interval). In both groups, participants were able to successfully learn the reversal structure, reflected in the periodic switches of left/right machine choices over the experiment. c, Choice optimality was calculated for each participant and stratified by group and condition. In both groups, choice optimality was significantly higher than chance (50%; alcohol, n = 68, t = 18.9, P < 0.001; cannabis, n = 65, t = 18.2, P < 0.001). d, Average craving ratings were significantly higher in the addictive cue condition compared with the money condition in both groups (alcohol, n = 68, t = 3.14, P = 0.002; cannabis, n = 65, t = 2.47, P = 0.016). e, Variances in craving ratings were not significantly different between conditions for either group (alcohol, n = 68, t = 0.793, P = 0.431; cannabis, n = 65, t = 0.841, P = 0.403). Two-sided repeated sample t test without correction was used for all significance testing. For all box plots, the bottom and top of the box are bounded by the 25% and 75% percentile values, respectively, with the median value represented as the center line and whiskers representing 1.5 times the interquartile range. *P < 0.01, **P < 0.001. NS, not significant.

Table 1 Participant characteristics

Results

Participants learned to maximize outcomes overall and reported stronger cravings in the addictive condition

First, we examined model-agnostic behaviors that reflected how participants performed the task. In typical monetary bandit tasks, it is well established that humans learn to choose the option that maximizes monetary outcomes34,35,36; yet it remains unclear whether individuals with moderate to heavy substance use behave similarly in the addictive stimulus condition. In this task, participants were instructed to seek block-specific winning outcomes (either addictive or monetary cues) and were paid a monetary bonus at the end of the task based on their final score. As in standard monetary tasks, here we defined choice optimality as the percentage of choosing the correct option (that is, choosing the machine with the higher reward rate of 80%). Overall, participant choices were highly similar to the true reversal structure of the task (Fig. 2b), and choice optimality was significantly higher than chance (50%) regardless of condition (Fig. 2c; alcohol/money: 71 ± 9%; alcohol/addictive: 70 ± 8%; cannabis/money: 69 ± 8%; cannabis/addictive: 70 ± 8%; all P < 0.001), confirming that participants successfully learned to exploit the machines for both addictive and monetary outcomes. Choice optimality did not differ across participant groups (t = 1.01, P = 0.31) or task conditions (t = 0.38, P = 0.71) or show an interaction effect (F = 0.44, P = 0.51), ensuring that findings related to craving-choice computation would not be attributable to distinct task performance alone. In addition, win–stay and lose–shift probabilities did not differ significantly between groups (Supplementary Fig. 1), suggesting that these model-free metrics alone did not reliably differentiate and characterize behavioral differences between alcohol drinkers and cannabis users.

Next we examined self-reported cravings during the task across groups and conditions. As predicted, substance craving was greater in the addictive than in the monetary condition in both groups (Fig. 2d; alcohol: t = 3.141, P = 0.002; cannabis: t = 2.465, P = 0.016), suggesting that addictive cues increased craving in this task, replicating cue-induced effects on craving. Substance craving did not differ substantially between alcohol and cannabis groups in either the addictive (t = 0.067, P = 0.946) or the monetary (t = 0.218, P = 0.828) condition. We also examined whether craving levels fluctuated during the task by examining the variability of craving ratings (that is, variance). We observed substantial variability in craving ratings within subjects (Fig. 2e) such that craving variances were greater than zero for across groups and conditions (all groups and conditions: t > 7, P < 0.001), confirming dynamic changes in perceived craving in response to outcomes. Variances did not differ by group (F = 0.26, P = 0.61), condition (F = 1.07, P = 0.30) or their interaction (F = 0.17, P = 0.68).

Finally, we also examined whether participants’ craving ratings might be inherently correlated with mood as negative affect has been found to be associated with increased drug craving37,38. To assess this, participant mood ratings were measured intermittently (ā€œwhat is your mood right now?ā€) for 20% of the trials. Within-participant craving and mood ratings were not significantly correlated in the money condition (Supplementary Fig. 2; mean correlations; alcohol: r =ā€‰āˆ’0.06, P = 0.31; cannabis: r =ā€‰āˆ’0.002, P = 0.96) or in the addictive cue condition (alcohol: r =ā€‰āˆ’0.02, P = 0.70; cannabis: r = 0.06, P = 0.31), indicating that craving ratings contained distinct information from mood ratings.

Momentary craving biased drug-related learning across cannabis and alcohol groups but in opposite directions

Next we constructed computational models that represent the bidirectional relationship between momentary craving and choice behavior (see Methods and Supplementary Table 1 for details). First, we composed five candidate model classes to account for choice behavior, with a modulation parameter (φ) defining the degree to which momentary craving modulated different components of the decision process (temporal difference reinforcement learning, no bias (TDRL), outcome bias (o-bias), learning-rate bias (α-bias), temperature bias (β-bias) and momentum-based TDLR (m-TDRL)). The first four models were derived from classic TDRL models39,40 from the RL literature. In the o-bias model, momentary craving modulated the perceived magnitude of the outcome signal41,42, while it instead modulated the learning rate and softmax temperature parameters in the α-bias and β-bias, respectively. The m-TDRL model conceptualized craving as ā€˜momentum’, similar to recent efforts in modeling mood dynamics24,43,44. Finally, we also included a heuristic model that did not track values but instead employed a simple noisy win–stay, lose–shift (WSLS) strategy to make decisions.

Of our candidate models linking craving and valuation, which best explains the choices made by participants? Following model comparison (Fig. 3a,b), we found that the α-bias model performed best in the addictive cue condition across both alcohol- and cannabis-using groups. To assess the fidelity of fit by this model, we generated 2,000 simulations of choice behavior and calculated the degree of alignment between simulations and true behavior. We found that simulated behavior matched true behavior significantly better than chance (Supplementary Fig. 3; both conditions: t > 10, P < 0.001) and close to optimal (Supplementary Fig. 4), with excellent parameter recovery (Fig. 3c,d). Examination of the parameter values for this model (Fig. 3e,f) revealed that φ was positive in alcohol users (M = 0.209, s.d. = 0.798, P = 0.034) and negative in cannabis users (M =ā€‰āˆ’0.995, s.d. = 1.435, P < 0.001), with a significant difference between the two groups (Fig. 3i; t = 6.01, P < 0.001). This result suggests that higher craving accelerated alcohol-related learning for alcohol drinkers but slowed cannabis-related learning for cannabis users. In other words, alcohol craving increased one’s sensitivity toward alcohol-related RPEs, whereas cannabis craving reduced sensitivity to cannabis-related RPEs. In addition, the learning rate α was also lower in alcohol users (Fig. 3g; M = 0.736, s.d. = 0.068) compared with cannabis users (M = 0.801, s.d. = 0.084; t = 4.90, P < 0.001), indicating that chronic alcohol use and cannabis use might have exerted opposing effects on the unmodulated baseline learning rate. Inverse temperature did not show differences between groups (Fig. 3h; t = 0.521, P = 0.604).

Fig. 3: Decision-making model comparison and parameter distribution: addictive condition.
Fig. 3: Decision-making model comparison and parameter distribution: addictive condition.
Full size image

a,b, For all models, Ī”BIC was defined as the difference between each model’s BIC and the best-performing BIC in the alcohol group (a) and cannabis group (b). The α-bias model performed best across groups. c,d, For the best-performing α-bias model, we performed parameter recovery by simulating data from the parameter estimate and refitting the simulated data in the alcohol group (c) and the cannabis group (d). Pearson correlation between original and recovered parameter estimates was used to perform significance testing. All parameters (α, β, φ) displayed excellent parameter recovery across groups (P < 0.01 across parameters and groups). e,f, Distributions of parameters were extracted from the α-bias model for the alcohol group (e) and cannabis group (f). The left panels display the joint distributions of α (learning rate) and φ (modulation factor) as these interact directly in the model, while β (inverse temperature) is visualized separately. A two-sided one-sample t test without correction was used for all significance testing. In the alcohol group, φ was found to be significantly positive (t = 2.159, P = 0.034), while in the cannabis group, it was found to be significantly negative (t =ā€‰āˆ’5.590, P < 0.001). g, Base learning rate (α) was higher in cannabis users versus alcohol users (t = 4.90, P < 0.001). h, There was no difference between groups for inverse temperature (t = 0.521, P = 0.604). i, Modulation parameter φ was significantly higher in alcohol users versus cannabis users (t = 6.01, P < 0.001). A two-sided independent-sample t test without correction was used for all significance testing for g and h.

In the monetary condition, by contrast, the o-bias model performed best across groups (Supplementary Fig. 5a,b), and parameter recovery for this model was, again, excellent (Supplementary Fig. 5c,d). Examination of parameter values (Supplementary Fig. 5e,f) revealed that both groups showed significantly positive φ (alcohol: M = 0.118, s.d. = 0.099, P < 0.001; cannabis: M = 0.190, s.d. = 0.097, P < 0.001), indicating that, across groups, higher craving levels increased the perceived magnitudes of monetary outcomes. Per-group modulation parameters drawn from α-bias models for each condition also demonstrated a significant difference between conditions (Supplementary Fig. 6; P < 0.001) although these must be interpreted with caution as the α-bias was best performing only in the addictive condition.

To test the specificity of these observed effects, we also collected two additional control datasets. First, we collected a non-substance-using healthy control (HC) dataset, where one HC group performed the alcohol version of the task (n = 18), and a second HC group performed the cannabis version of the task (n = 22). Second, we collected a ā€˜crossover’ control dataset, where a sample of cannabis users performed the alcohol version of the task (n = 20), and vice versa (n = 18). Supplementary Fig. 7 summarizes the modeling results for alcohol cue task across all groups (HC, alcohol group and the crossover cannabis), while Supplementary Fig. 8 summarizes results for the cannabis cue task across the same three comparison groups. In the two HC groups, the no-bias TDRL model, instead of the α-bias model, provided the best fit (Supplementary Figs. 7a,b and8a,b), suggesting that craving did not bias non-substance users’ decision-making. Similarly, in the crossover dataset, the α-bias model performed worse than either the no-bias TDRL model or the o-bias model (Supplementary Figs. 7e,f and 8e,f), suggesting the substance specificity of craving’s effects on decision-making. Together, these findings support the notion that the impact of craving on learning is substance specific; that is, learning in substance users is biased by cues related to the primary substance they use rather than by cues from other substances.

Together, these findings highlight an important role for craving in modulating learning across alcohol and cannabis groups. First, momentary craving biases learning rate in response to addictive cues yet influences outcome perception in response to non-addictive cues across both groups. Second, alcohol craving and cannabis craving have opposing effects on drug-related learning; alcohol craving accelerates alcohol-related prediction-error encoding, while cannabis craving reduces learning based on cannabis-related prediction errors. Third, craving-driven learning is substance specific: in crossover conditions, learning is not biased by cues from a different substance, and in non-users, substance cues do not bias learning at all. These models provide overlapping yet distinct computational mechanisms mediating the relationship between craving and decision-making in alcohol drinkers and cannabis users.

Trial-wise expected value and outcome combine to drive perceived craving across groups and decision contexts

Although our results demonstrate a directional effect of craving on learning, the nature of the reverse interaction remains unclear; that is, how do prior expectations and outcomes influence perceived craving? To this end, we constructed models where different components of decision variables and their combinations contributed to future craving ratings. These models were inspired by (1) a recently proposed theoretical framework that craving arises as a posterior inference stemming from both prior expectations and prediction errors generated by outcomes45,46 and (2) computational models of other types of subjective states such as mood driven by various value signals28. Four model classes were constructed and compared: RPE-elicited models (where only prediction errors influenced craving), EV-elicited models (where only expected values influenced craving), EV- and outcome-elicited craving models, and RPE- and EV-elicited craving models. Note that we did not include a model of both outcomes and RPEs as these two variables were highly correlated (Supplementary Figs. 9 and 10).

Model comparison revealed that, across both alcohol and cannabis groups, momentary craving was best explained by a combination of expected values and outcomes in both addictive and monetary conditions (Fig. 4a,b). Predicted cravings generated by this model were significantly correlated with true cravings (Fig. 4c–f; t > 11.0, P < 0.001). Overall, these results build on several recent findings substantiating the influence of both drug cues and expectations on momentary craving16,30.

Fig. 4: Craving model comparison and parameter distribution: addictive condition.
Fig. 4: Craving model comparison and parameter distribution: addictive condition.
Full size image

a,b, As with decision models, Ī”BIC was defined as the difference between each model’s BIC and the best-performing BIC in the alcohol group (a) and cannabis group (b). The joint outcome + EV model performed best across groups. c–f, In the alcohol group, for the best performing craving model, we calculated the correlations between model-predicted craving and true craving ratings. An example alcohol participant’s true (black line) versus predicted craving (blue line) is displayed. Two-sided one sample t test without correction was used for all significance testing (c). There was a high degree of correlation across participants in the alcohol group (mean r = 0.446), indicating strong model efficacy (d). Similarly, in the cannabis group, we visualize an example participant’s true (black line) versus predicted craving (yellow line) (e). There was a high degree of correlation across participants in cannabis group (mean r = 0.446; f). g,h, Distributions of parameters were extracted from the outcome + EV model. The left panel displays the joint distributions of outcome weight (\({w}_{\mathrm{outcome}}\)) and EV weight (\({w}_{\mathrm{EV}}\)), while craving baseline is visualized separately. In the alcohol group (g), \({w}_{\mathrm{outcome}}\) (t = 4.322, P < 0.001) was significantly positive and \({w}_{\mathrm{EV}}\) (t =ā€‰āˆ’0.256, P = 0.798) was not significantly different from zero. In the cannabis group (h), neither parameter reached significance (\({w}_{\mathrm{outcome}}\), t = 1.603, P = 0.114; \({w}_{\mathrm{EV}}\), t = 1.298, P = 0.199). i–k, There were no significant group differences in baseline craving (t = 0.329, P = 0.776) (i), outcome weight (t =ā€‰āˆ’0.911, P = 0.364) (j) or EV weight (t = 1.109, P = 0.255) (k). Two-sided independent-sample t test without correction was used for all significance testing in i–k.

Next we extracted the parameters from the best-performing model to interpret the processes underlying elicitation of craving during the task. In the addictive cue condition, outcome weight was significantly positive in alcohol users (Fig. 4g; M = 0.167, s.d. = 0.319, P < 0.001) but not cannabis users (Fig. 4h; M = 0.101, s.d. = 0.506, P = 0.114). EV weight was not significant in alcohol users (Fig. 4g; M =ā€‰āˆ’0.005, s.d. = 0.161, P = 0.798) or cannabis users (Fig. 4h; M = 0.026, s.d. = 0.162, P = 0.199). Parameter values did not significantly differ between groups (Fig. 4i–k).

In the monetary condition, the combined outcome + EV model was again found to be best performing (Supplementary Fig. 11a,b), and predicted cravings were highly correlated with true cravings (Supplementary Fig. 11c–f). Analysis of parameter estimates (Supplementary Fig. 11e,f) showed that outcome weight was significantly positive in cannabis users (M = 0.121, s.d. = 0.298, P = 0.002) but not in alcohol users (M = 0.053, s.d. = 0.459, P = 0.349), while EV weight was significantly negative in alcohol users (M =ā€‰āˆ’0.047, s.d. = 0.168, P = 0.026), but not in cannabis users (M = 0.0.043, s.d. = 0.228, P = 0.132).

In summary, the models constructed here provide a means for disentangling two important components of momentary perceived craving: expectation-induced effects (that is, EVs) and cue-induced effects (that is, outcomes). Here we again found highly divergent computational signatures for alcohol and cannabis users that were context dependent. In the addictive cue condition, momentary craving was dynamically driven by increases in cue-induced influence for alcohol users, while cannabis users did not show strong consistency across participants. Conversely, in the monetary condition, momentary craving was consistently driven by increases in cue-induced effect for cannabis users, while alcohol users instead showed decreases driven by EV.

Model-derived computational parameters were useful in predicting alcohol, but not cannabis, addiction risk scores

Thus far, our results provided a computational account for the bidirectional relationship between substance craving and decision-making. Next, we sought to examine whether these computational estimates had additional utility in predicting clinical risk for developing addiction beyond simple demographics or model-agnostic metrics. Most similar analyses in the literature thus far conduct simple correlations between computational estimates and clinical scores, which does not necessarily demonstrate the added value of computational modeling and often suffers from multiple comparison issues. As such, we sought to conduct formal comparison between six classes of regression models that contain full sets of computational, demographic and model-agnostic variables (Supplementary Table 2): (1) demographic regression (Demo-only), in which only basic demographics (age, sex, race, income and education level) were used to to predict risk, (2) computational model-derived regression (Comp-only), in which only computational parameters from the addictive condition were used, (3) model-agnostic regression (Agnostic-only), in which only task performance summary metrics (mean and s.d. of craving and choice optimality in the addictive condition) were used, (4) Demo + comp, where both demographic and computational predictors were included, (5) Demo + agnostic, where both demographic and model-agnostic predictors were included, and (6) Demo + comp + agnostic, where demographic, computational and model-agnostic predictors were all included.

For each group, models were compared and ranked by expected log pointwise predictive density Widely Applicable Information Criteria (elpd_waic) scores, and normalized true and predicted addiction risk scores were plotted against each other (Fig. 5a–d). We found that the full Demo+comp+agnostic model performed best in alcohol users (elpd_waic =ā€‰āˆ’92.992; r = 0.545, P < 0.001), while Demo-only performed best in the cannabis users (elpd_waic =ā€‰āˆ’94.188; r = 0.372, P = 0.003), suggesting that computational parameters provided added utility for predicting alcohol, but not cannabis, addiction risk. We also sought to interpret the significantly predictive variables from the best-performing regression model in relation to addiction risk scores (Fig. 5e,f). Alcohol addiction risk scores were positively associated with learning rate, baseline craving and outcome weight, and negatively associated with inverse temperature, standard deviation of craving and (marginally) modulation parameter. Cannabis addiction risk, however, was negatively associated only with income and marginally with age and education. In both groups, our multivariate regression approach revealed parameters of interest not discovered by a univariate correlation analysis with model-free parameters (Supplementary Fig. 12).

Fig. 5: Clinical risk prediction analysis.
Fig. 5: Clinical risk prediction analysis.
Full size image

Regression analyses were conducted to determine the efficacy of Comp-only, Agnostic-only, Demo-only, Demo + comp, Demo + agnostic and Demo + comp + agnostic in predicting clinical risk. Note that ASSIST scores were normalized before regression. Pearson correlation between true and predicted ASSIST scores was used to perform significance testing. a, In the alcohol group, Demo + comp + agnostic outperformed all alternatives (elpd_waic =ā€‰āˆ’92.99). b, In cannabis users, Demo-only was the best-performing model (elpd_waic =ā€‰āˆ’94.032). c,d, The best-performing models in both groups generated predictions that were highly correlated with true ASSIST scores (alcohol (c), r = 0.545, P < 0.001; cannabis (d), r = 0.372, P = 0.003; the shaded region represents the 95% CI). e,f, Parameters were extracted from the samples from the posterior distribution, and the 89% highest density intervals are plotted for each significant variable. Weight distributions for race and sex were eliminated due to a high degree of variance and low contribution to prediction. In alcohol users (e), α (learning rate), craving baseline and \({w}_{\mathrm{outcome}}\) were found to be significantly positively associated, while β (inverse temperature) and standard deviation of craving were found to be significantly negatively associated, with ASSIST scores. In the cannabis group (f), income was negatively associated, and there was a marginally negative association for age and education as well.

In summary, our comparative regression analysis unexpectedly found that computational parameters from our models were substance dependent in their predictive utility. This may indicate that the direct utility of computational fingerprints of decision-making and craving in predicting addition risk may be highly dependent on the type of substance or on a the particular set of computational latent parameters interrogated during task performance.

Discussion

In this study, we built a computational framework subserving the bidirectional relationship between momentary craving and decision-making in alcohol drinkers and cannabis users. Our findings support the notion that craving and decision-making are two computationally intertwined processes across addictive domains. In addition, we built on, and empirically tested, recent computational theories that substance-related expectations and outcomes both play a role in craving. Substantiating our hypotheses, we found that alcohol and cannabis groups converged on a common algorithm in which momentary craving dynamically biased learning specifically in response to addictive cues, and both expected values and outcomes influenced momentary craving. Notably, however, alcohol drinkers and cannabis users also diverged with respect to the patterns of parameters associated with decision-making and craving, providing distinct substance-specific computational fingerprints of these interacting mental processes in addiction.

Our modeling of choice behaviors provides strong evidence for a shared computational mechanism in alcohol and cannabis users. Importantly, this modulatory effect was also cue specific: while craving modulated learning rate in response to addictive cues, it instead modulated the magnitude of reward in response to monetary cues. This effect was also present in both groups. Moreover, these effects were not present in either healthy controls or substance users who were exposed to cues that were not their primary substance of choice. Together, these results suggest craving-induced modulation of decision-making specifically affects the primary substance used by the individual and that such modulation applies to both alcohol and cannabis groups examined in this study. There already exists compelling evidence supporting the view that computational mechanisms involved in addiction may be shared across substance-use disorders, including alcohol and cannabis use. For example, previous studies have highlighted several mechanisms for impaired goal-directed planning, belief-updating and habit formation across substance use disorder groups41,42,47,48. Network analytic approaches have also shown that craving plays a central role across substance use disorders49,50,51, and individuals who co-use alcohol and cannabis exhibit heightened cue-induced craving and altered decision-making for both substances52,53,54. Neurobiologically, several connectome-based predictive approaches have identified a transdiagnostic ā€˜craving network’ involving regions of the salience, subcortical and default mode networks55,56,57, suggesting a common neural signature for craving across addictive disorders, including alcohol- and cannabis-use disorders. Building on these previous results, our models provide first evidence that both alcohol and cannabis users converge on the same computational algorithm that explicitly links craving and decision-making.

Importantly, however, alcohol and cannabis users diverge in, and thus are uniquely identifiable by, several computational metrics. On the basis of the construction of the dynamic learning rate, a higher base learning rate with a negative φ parameter (that is, in cannabis users) could lead to a similar learning trajectory as a lower base learning rate and positive φ parameter (that is, in alcohol users); however, the uncoupling of these two latent parameters within our computational model reveals that chronic cannabis use contributes to baseline craving differently than chronic alcohol use. Notably, we found differences in both trait-like parameters (for example, base learning rate) and state-like parameters (for example, φ) influencing momentary substance craving. In alcohol drinkers, we found that increased craving led to faster learning from alcohol-related prediction errors, suggesting that alcohol-associated outcomes that follow a state of high craving may lead to stronger valuation by the brain of a drinker. Several previous studies have identified similar learning dysfunctions in alcohol-use disorder, including premature switching58, accelerated valuation of negative outcomes59 and high impulsivity60, that may be reflective of faster craving-induced learning. By contrast, in cannabis users, high craving led to slower learning from prediction errors about cannabis-related outcomes. This result conforms with previous findings of diminished learning in cannabis users61, particularly as relating to memory62,63 and social context64 of cannabis cues. In addition, while direct comparisons are lacking, several studies have also described clear dissimilarities in computational factors65,66 between alcohol and cannabis users (for example, impaired delay discounting but unaffected reversal learning in cannabis users, but significant reversal learning impairment in alcohol users). Differences in computational signatures may also reflect the disparities in clinical phenomenology between alcohol and cannabis consumption; for example, alcohol is often consumed in shorter, concentrated periods, and its intoxicating effects take longer than those of cannabis, which is often consumed more gradually throughout longer periods and takes only minutes for its effects to be felt. Moreover, while we did not collect neurobiological data, it is well established that these two substances have very different pharmacological effects: cannabis acts on cannabinoid receptors and produces relaxing effects, while alcohol, among other effects, can cause euphoria, excitement and disinhibition. These effects are in line with the behavioral findings observed in our study, although the exact neural mechanisms remain to be tested. Overall, our findings propose drastically diverging alterations in the craving–learning associations present in these two substance-use disorders that, in turn, may guide development of substance-specific clinical interventions. Further exploration and characterization will be necessary to validate these findings prospectively, explore brain substrates and compare/contrast them to the computational signatures of other substance-use disorders (for example, opioid, cocaine).

We also found that, across groups, both expected value and outcome related to the substance—regardless of the decision context—dynamically drove changes in cravings. Importantly, this finding realigns the classic cue-induced craving phenomenon9,67 with newer perspectives on craving based on the Bayesian theoretical framework. In this view, cue reactivity can be deconstructed as being composed of both prior expectations and the actual outcomes related to drugs that jointly drive increased craving; that is, cues elicit craving through two distinct mechanisms: (1) prior beliefs about their value (EV-driven effect) and (2) observed evidence (outcome-driven effect). This result explains recent findings that both prior beliefs and outcomes are important components of perceived craving in addiction at behavioral68,69,70,71, computational41,72,73 and neural16,74,75 levels.

Finally, we found that, among alcohol users, computational parameters can reflect latent information that is predictive of clinical risk beyond model-agnostic metrics or demographics alone, suggesting that these latent computational parameters may contain unique information related to individual differences in alcohol addiction risk scores. This provides empirical evidence that computationally derived parameters may capture features core to addiction that are not characterized by model-agnostic or demographic measures. These findings provide compelling evidence for the value of computational modeling in uncovering latent information that may be highly valuable translationally. Nevertheless, further work will be required to refine the sets of latent computational parameters that may be most salient to the predictions of clinical risk in other addictive disorders.

Our finding of an algorithmic link between craving and decision-making has direct clinical implications: specifically, by reducing or managing cravings, individuals may be able to break the vicious cycle of addiction-associated decision-making that leads to poor clinical outcomes. Although interventions regarding craving have been proposed, tested and examined widely76,77,78, our models provide the first mechanistic explanation for their success. Further exploration may reveal a novel line of therapeutic interventions focusing on the interplay between craving and addiction-associated decision-making that target both components simultaneously. Although recent psychological research has hinted at similar results70,79,80, this finding merits further clinical investigation as the majority of current interventions typically involve focusing on negative aspects of addiction directly71,81,82, or mindfulness strategies83,84,85.

Our results are limited by an absence of direct evidence supporting the neurobiological substrates that facilitate the mental computations proposed in this model; while previous work has examined decision-making effects on subjective states (for example, happiness28 or craving16,74), no neuroimaging studies have yet investigated a bidirectional interaction between them. A next logical step would be to collect neural data (for example, using neuroimaging) to find the neural patterns encoding the computational mechanisms outlined here. Another limitation is that, although overall mood ratings were collected, direct subjective evaluations of cue valence or trial-to-trial arousal were not assessed, potentially contributing to individual- or group-level differences in craving modulation or decision-making. Moreover, our experimental design choice to prime participants with explicit instructions to seek substance-related outcomes may not reflect unprompted or uninstructed behaviors; abstinent or treatment-seeking individuals might experience substance cue outcomes as aversive rather than appetitive, possibly resulting in categorically different computational signatures of craving-modulated learning. Finally, although our sample represents high-risk behaviors, these findings should be replicated in individuals with confirmed substance-use disorders to ensure clinical applicability and to provide a more robust understanding of how the proposed mechanisms operate under pathological conditions.

Conclusion

Overall, this study highlights a reciprocal, dynamic link between decision-making and craving in alcohol users and cannabis users. Our results demonstrate that momentary craving leads to biased learning in drug-related decision-making contexts and that both prior beliefs and prediction errors drive momentary craving, supporting an updated and refined craving framework.

Methods

Participants and data collection

The study was approved by the institutional review board at the Icahn School of Medicine at Mount Sinai (protocol number IRB-18-01301). All participants were recruited from Prolific, an online data collection platform. All participants provided signed informed consent online before participation. The data collection and analysis were compliant with all relevant ethical rules and regulations stipulated by Icahn School of Medicine at Mount Sinai.

Figure 1 illustrates the data collection workflow. We first performed a high-throughput screening of drug and alcohol usage in 1,000 participants with the following characteristics: US residents, fluent English speakers, high approval rating on Prolific and no previous completion of any of our group’s experiments. Participants completed two screening surveys implemented on Redcap. The first was Alcohol, Smoking and Substance Involvement Screening Test (ASSIST)86, which was developed as a screening tool to help primary health professionals detect and manage substance use and related problems. ASSIST allows for screening of use of substances such as alcohol, cannabis, tobacco, stimulants and inhalants. Second, we used a lab-standard demographics survey to assess basic demographic features such as age, sex, level of education, income and race. Participants were paid $1.00 ($12 hrāˆ’1 rate) for full completion of all screening surveys, which took about 5 min on average.

From the screening surveys, we recruited subsets of online participants with moderate to high alcohol and cannabis risk scores using the following criteria. (1) The alcohol-use cohort reported moderate to heavy use of alcohol weekly as identified by ASSIST and met or exceeded the ASSIST score threshold of 3 (moderate alcohol risk). (2) The cannabis-use cohort reported moderate to heavy use of cannabis weekly and met or exceeded the ASSIST score threshold of 4 (moderate cannabis risk; recommended by the ASSIST rubric). Although there was comorbidity in substance use, there was nevertheless a significant difference between ASSIST screening scores between groups (Supplementary Fig. 13). Participants recruited for the slot machine task were paid $4.60 ($12 hrāˆ’1 rate) for full completion of the task, along with a bonus of $0.01 per point in their total score (on average, participants scored 77 points for a total of $0.77 bonus), for a total average earning of $5.37 per participant.

From the full eligible cohort, we first collected 100 participants’ performance for the slot machine game for each group. Participants with substance use were additionally asked to complete further group-specific questionnaires. Alcohol users completed an Alcohol Dependence Scale (ADS)87,88 and an Alcohol Use Disorder Identification Test89. Cannabis users completed a Cannabis Severity of Dependence Scale (SDS)90 and a Cannabis Abuse Screening Test (CAST)91. Six alcohol drinkers and two cannabis users met criteria for dependence (Supplementary Fig. 14). Participants who did not pass data quality control (Quality control) were further excluded from the analysis (26 for alcohol group and 24 for cannabis group), resulting in 68 alcohol drinkers and 65 cannabis users in the main experiment (see Table 1 for their characteristics).

For additional control experiments, we collected a group of 40 non-substance-using HCs (screened with ASSIST survey), who completed the same slot machine game (alcohol cues n = 18, cannabis cues n = 22). We also collected a ā€˜crossover’ group of 36 additional substance users (alcohol drinkers n = 18; cannabis users n = 18), who played the game with the opposite cue (alcohol users with cannabis cues and vice versa).

Instructions

We utilized a modified two-armed bandit task that incorporated reversals to encourage continuous learning during the experiment. At the start of the task, participants were presented with the instructions about how to play the game. Participants were presented with two slot machines and were encouraged to do their best to maximize their wins from the machines, with the incentive of being rewarded with a greater bonus payment at the end of the game based on their final score.

In the money condition, participants received a monetary outcome, presented as an image of a coin. In the addictive cue condition, they received an outcome corresponding to an addictive cue, selected according to the addictive group to which they belonged (for example, in the alcohol-using group, participants were shown an image of alcohol). In addition, during the start of the experiment, participants were able to select one of three possible addictive cues that were most tempting. Participants in the alcohol-using group were able to select from a beer, wine or liquor outcome, while cannabis users were able to select from a blunt, bong or bowl.

Participants were then also informed that they would intermittently be asked to assess and report their craving and mood. They were given specific definitions for craving (ā€œAn intense, conscious desire or wanting for somethingā€) and mood (ā€œa non-specific, persistent general feeling about your current mental state, distinct from emotions, which are shorter-lived and specific to a particular thingā€) to provide them with a deeper understanding of the concepts being measured and to provide a concrete basis upon which their self-reports could be compared. Baseline craving and mood ratings were then assessed before presentation of the slot machine task.

Finally, participants were informed that one of the slot machines would provide more wins than the other at all times, but the better slot machine might change over the course of the experiment. The true experimental structure (delineated below), including the probabilities of winning, the reversal timings and number of trials, was not revealed to the participants beyond the information listed above.

In this task, we aimed primarily to identify the role of craving in modulating optimal decision-making stratified by cue specificity (that is, whether the outcome was an addictive cue). This necessitated requiring participants to be exposed to drug cues (and neutral cues) to induce craving fluctuations during the task. Note that there is inherent complexity in the presentation of addictive cues; they might be appetitive/rewarding (because participants are given explicit instruction to maximize score and are drug seeking) or aversive (because seeing drug cues without being able to consume may be aversive or participants might be seeking to quit or abstain). Nevertheless, the pairing of trial wins to craving-inducing cues was essential in answering our question regarding the trial-wise influence of momentary craving on altering the optimal decision-making process. In addition, the explicit instructions to seek out substance cues were designed to mimic real-life situations of drug seeking during addictive states.

Experimental structure

There were 60 trials per condition and 5 practice trials at the start of the experiment for a total of 125 trials. The five practice trials were discarded during the computational modeling. Presentation of conditions was randomized over the cohort, such that half of the participants received the money condition first and the other half received the addictive cue condition first. The winning slot machine presented a win 80% of the time, while the other presented a win 20% of the time. The optimal slot machine switched four times over the course of the experiment (ā€˜reversals’). First selection of the best machine (either right or left) was pseudorandomized, and reversal timings were pseudorandomized as either 12–12–11 or 13–12–10. Craving ratings were assessed approximately every 3 trials for a total of 20 craving ratings per condition. Mood ratings were assessed approximately every 5 trials for a total of 12 mood ratings. Ratings were collected on a linear scale from 0 to 50, where participants used the arrow keys to move a cursor from left to right to select their answers, where the left end of the bar was labeled ā€˜Low’ and the right end was labeled ā€˜High’.

Quality control

Before analysis, participant data from each group underwent a series of quality control checks. First, we included one attention check during the experiment, with instructions about halfway through the task to select the highest option if the participant was paying attention. Participants that did not pass this attention check were excluded. Second, since the optimal machines and reversal timings are known, it is possible to construct the vector of optimal choices during the experiment. A participant’s raw choice behaviors were compared with the optimal choice vectors for each condition, and only participants with greater than 50% optimality were selected, ensuring that they learned to play the game well and were not simply randomly responding in the task, with a relatively low bar of exclusion reducing the chance that randomness of task structure did not unnecessarily exclude well-performing participants. Finally, we z scored the reported craving ratings during the task and excluded the participant if the standard deviation of the ratings did not exceed 1, suggesting that there was very low variability in the rating scores. Moreover, we found that most participants with extremely low craving variability seemed to exclusively report a craving of 25/50, which was the default value in the rating scale, suggesting that they were not properly responding to the prompts.

Model-agnostic analysis

  1. (1)

    Calculation of mean and s.d. of craving and mood: we calculated the means and variances of reported cravings for each individual across groups and conditions. These were used to assess whether simple summary statistics of overall trends of craving were associated with clinical measures. The same process was repeated for reported moods. Individual distributions of cravings and moods were aggregated and reported across groups and conditions.

  2. (2)

    Survey scoring: we scored the Redcap surveys for each group according to a questionnaire-specific scoring. For the alcohol group, we utilized three surveys: ASSIST, Alcohol Use Disorders Identification Test (AUDIT) and ADS. ASSIST scores ranged from 0 to 40 (<3 low risk, 4–26 medium risk, >26 high risk). AUDIT scores ranged from 0 to 34 (<7 low severity, 8–14 medium severity, >14 high severity). ADS scores ranged from 0 to 54 (0 no risk, 1–13 low risk, 14–21 moderate risk, 22–30 substantial risk, >30 severe risk). For the cannabis group, we utilized three surveys: ASSIST, CAST and SDS. ASSIST–Cannabis scores ranged from 0 to 36. CAST scores ranged from āˆ’6 to 20 (<3 low risk, 3–6 moderate risk, >6 high risk). SDS scores ranged from āˆ’5 to 13 (≤3 low risk, ≄4 high risk). The distributions of all these surveys were visualized across groups. In addition, we calculated the difference between clinical risk scores before the task (during screening surveys) and after the task to assess the stability of these clinical measures.

  3. (3)

    Choice optimality: we qualitatively assessed the performance of participants to ensure that they were able to learn the structure of the task well. To do this, we compared the choice of machine for all participants in a group with the optimal choice that could be made at that time, where the optimal choice was defined as the machine with a higher probability of winning. Participants with lower than 50% optimality were excluded from further analysis because they were unable to learn the task structure. Finally, we plotted the distributions of optimality of remaining participants to demonstrate that participant optimality was well above chance.

  4. (4)

    Qualitative performance checks: we averaged the choice of machine across participants by group, controlled for randomization of the order of the best machine presented first (either left or right) to qualitatively assess group-level tracking of reversals across the task. We then visualized the overall distributions of craving and mood during the task within individuals and across groups to ensure that there was a realistic distribution of cravings and moods reported during the task.

  5. (5)

    Low-level sanity check correlations: we correlated group-specific clinical scores (ASSIST for alcohol and cannabis) with percentage optimality, total score and baseline and mean craving/mood ratings. We also performed a Spearman correlation between reported mood and craving ratings within participant to check the covariance of the two.

Modeling analysis

Decision-making and craving modeling were done sequentially. Each condition (addictive cue and monetary) was modeled separately. In both conditions, explicit substance craving ratings were used for both decision and craving modeling. Craving ratings in both conditions were up-sampled for use in the decision models (since there were only craving ratings for approximately every three decision trials). There were six classes of decision-making (Supplementary Table 1) balanced on model complexity (that is, number of parameters utilized by the model).

  1. (1)

    Heuristic model: no values were computed during the task. Decisions were made with a noisy WSLS rule in which participants selected the same machine after a winning trial and switched machine choice after a losing trial with some probability ε of selecting a random machine.

    $${\mathrm{Decision}}_{t} \sim {\mathrm{WSLS}}\_{\mathrm{policy}}(\varepsilon,{\mathrm{Decision}}_{t-1},{o}_{t-1})$$
  2. (2)

    Asymmetric temporal difference learning: values were computed using a standard TDRL rule with asymmetric learning from positive and negative prediction errors39,92. Decisions were made with a softmax rule.

    $${V}_{t}={V}_{t-1}+{\alpha }_{\mathrm{pos}}\left({r}_{t}-{V}_{t-1}\right)\,{\mathrm{if}}\,{\mathrm{RPE}} > 0$$
    $${V}_{t}={V}_{t-1}+{\alpha }_{\mathrm{neg}}\left({r}_{t}-{V}_{t-1}\right)\,{\mathrm{if}}\,{\mathrm{RPE}}\le 0$$
    $${\mathrm{Decision}}_{t} \sim {\mathrm{Policy}}(\beta ,{V}_{t})$$
  3. (3)

    Outcome modulation models: values were computed with TDRL, but outcome magnitude was modulated by a bias parameterized by momentary craving and a modulation factor φ (refs. 41,42). Decisions were made with a softmax rule.

    $${r}_{t}^{{\prime} }={r}_{t}+\varphi \times {\mathrm{Craving}}_{t}$$
    $${V}_{t}={V}_{t-1}+{\alpha }_{t}\left({r}_{t}^{{\prime} }-{V}_{t-1}\right)$$
    $${\mathrm{Decision}}_{t} \sim {\mathrm{Policy}}(\beta ,{V}_{t})$$
  4. (4)

    α-bias: values were computed with TDRL, but learning rate α was modulated by a bias parameterized by momentary craving and a modulation factor \(\varphi\). Decisions were made with a softmax rule.

    $${\alpha }_{t}={\alpha }_{\mathrm{static}}+\varphi \times {\mathrm{Craving}}_{t}$$
    $${V}_{t}={V}_{t-1}+{\alpha }_{t}\left({r}_{t}-{V}_{t-1}\right)$$
    $${\mathrm{Decision}}_{t} \sim {\mathrm{Policy}}(\beta ,{V}_{t})$$
  5. (5)

    β-bias: values were computed with standard TDRL but inverse temperature \(\beta\) was modulated by a bias parameterized by momentary craving and a modulation factor \(\varphi\). Decisions were made with a softmax rule.

    $${V}_{t}={V}_{t-1}+\alpha \left({r}_{t}-{V}_{t-1}\right)$$
    $${\beta }_{t}={\beta }_{\mathrm{static}}+\varphi \times {\mathrm{Craving}}_{t}$$
    $${\mathrm{Decision}}_{t} \sim {\mathrm{Policy}}({\beta }_{t},\,{V}_{t})$$
  6. (6)

    m-TDRL: values were computed with TDRL, but outcome was modulated by a momentum term parameterized by nonlinear effects of past prediction errors24,43. Decisions were made with a softmax rule.

    $${h}_{t}={\alpha }_{\mathrm{momentum}}\times ({r}_{t}-{V}_{t-1}-{h}_{t-1})$$
    $${r}_{t}^{{\prime} }={r}_{t}\times {\varphi }^{\tanh (h)}$$
    $${V}_{t}={V}_{t-1}+{\alpha }_{t}\left({r}_{t}^{{\prime} }-{V}_{t-1}\right)$$
    $${\mathrm{Decision}}_{t} \sim {\mathrm{Policy}}(\beta ,{V}_{t})$$

Following decision modeling, in each condition independently, momentary craving was modeled as a nonlinear combination of prior beliefs (that is, expected values) and momentary surprise (that is, prediction errors). Craving trials were not up-sampled to match the number of decision trials, but all EVs and RPEs generated by the decision models (matching the number of decision trials) were incorporated into the geometric decay formulation of the equation to model craving. There were three variants of craving models:

  1. (1)

    Prediction-error-elicited craving (RPE): in this class, momentary craving was modeled as the geometrically decaying effect of momentary prediction errors, along with a static baseline craving.

    $${\mathrm{Craving}}_{t}={\mathrm{Craving}}_{\mathrm{baseline}}+{w}_{\mathrm{PE}}\mathop{\sum }\limits_{j=0}^{t}{\mathrm{RPE}}_{t-j}\times {\gamma }^{t-j}$$
  2. (2)

    Expectation-elicited craving (EV): in this class, momentary craving was modeled as the geometrically decaying effect of momentary expected values, along with a static baseline craving.

    $${\mathrm{Craving}}_{t}={\mathrm{Craving}}_{\mathrm{baseline}}+{w}_{\mathrm{EV}}\mathop{\sum }\limits_{j=0}^{t}{\mathrm{EV}}_{t-j}\times {\gamma }^{t-j}$$
  3. (3)

    Outcome- and expectation-elicited craving (outcome + EV): in this class, momentary craving was modeled as the geometrically decaying effect of both trial outcomes and momentary expected value, along with a static baseline craving.

    $$\begin{array}{rcl}{\mathrm{Craving}}_{t}&=&{\mathrm{Craving}}_{\mathrm{baseline}}+{w}_{\mathrm{outcome}}\mathop{\sum }\limits_{j=0}^{t}{\mathrm{Outcome}}_{t-j}\\&& \times \, {\gamma }^{t-j}+{w}_{\mathrm{EV}}\mathop{\sum }\limits_{j=0}^{t}{\mathrm{EV}}_{t-j}\times {\gamma }^{t-j}\end{array}$$
  4. (4)

    Expectation- and RPE-elicited craving (EV + RPE): in this class, momentary craving was modeled as the geometrically decaying effect of both momentary expected values and momentary prediction errors, along with a static baseline craving.

    $${\mathrm{Craving}}_{t}={\mathrm{Craving}}_{\mathrm{baseline}}+{w}_{\rm{PE}}\mathop{\sum }\limits_{j=0}^{t}{\mathrm{RPE}}_{t-j}\times {\gamma }^{t-j}+{w}_{\mathrm{EV}}\mathop{\sum }\limits_{j=0}^{t}{\mathrm{EV}}_{t-j}\times {\gamma }^{t-j}$$

Model implementation

Decision and craving models were implemented using pyEM93,94,95, a Python library for parameter estimation with iterative expectation maximization. Choices and craving ratings were modeled independently in two stages. During the expectation step, the maximum likelihood probability (PMLE) of task choices for each participant i was defined as the conditional probabilities of each choice at trial t given the expected values (EVt) for both slot machines at that trial and the participant’s parameter vector Īøi (that is, \(\sum \log (p({\rm{Choice}}_{t}|{{\rm{EV}}}_{{{t}}},{\theta }_{i}))\)). The prior probability (Pprior) was defined as the log-likelihood of the participant’s Īøi, given the current group-level Gaussian prior distributions of the parameters (Īø) across participants, with a mean vector μ and standard deviation σ2. The maximum a posteriori estimate was calculated by maximizing the sum of log PMLE + log Pprior across participants. Subsequently, during the maximization step, the group-level Gaussian prior distributions (parameterized by μ and σ2) were recomputed given the Īø vectors computed in the prior expectation step. These steps were repeated until convergence, where maximum a posteriori changed by <0.001 in consecutive iterations, or a maximum of 800 steps. The geometric decay parameter γ was estimated freely in the first iteration and then fixed to the mean of the estimates in the second iteration of model fitting to aid with model convergence. The group-level priors were initialized at μ = 0.1 and σ2 = 100 to allow for uninformative data-driven priors. Decision-model parameters were transformed from the Gaussian parameter space to the decision-model space by applying the appropriate link functions. A sigmoid function was applied to the learning rate (α). Inverse temperature was defined as \({\beta }_{\mathrm{transform}}=10/1+{e}^{-\beta }\). The exact same procedure was applied to the craving ratings, except that trial-wise log-likelihood value was defined as conditional probability of the observed craving rating from a Gaussian centered at the predicted model value at that trial.

Model fit metrics

Posterior samples of parameter sets were used to simulate choices for each participant following parameter fitting. The model ran a simulation for each parameter set for a total of 4,000 datasets of simulated choices. The following checks were performed on the simulated data.

  1. (1)

    Means of these simulations were visualized against true actions to qualitatively assess congruence.

  2. (2)

    Each choice simulation was given an accuracy score representing the number of simulated actions that matched true actions, where 50% was chance accuracy.

Model comparison

Models were compared using the integrated Bayesian information criterion (iBIC) score95,96. Briefly, the iBIC score was calculated with the following steps. First, k = 2,000 samples were drawn from the final group-level Gaussian estimates for parameters Īø(μ, σ2) for each participant i. The log-likelihood of each sample (LLi,k) was computed as the sum of the conditional probabilities of the participant’s choices (choicesi) given the sample parameter vector Īøk. This value was calculated for each sample and across participants. LLi,k were then summed across all participants and samples with the equation \({\mathrm{iLog}}={\sum }_{i}\log (\sum {e}^{{\mathrm{LL}}_{i,k}}/2000)\), and iBIC was defined as \({\mathrm{iBIC}}=-2\times \mathrm{iLog}+{n}_{\mathrm{param}}\times \log ({n}_{\mathrm{trials}})\). This procedure was applied to each model independently. The model with the lowest score was defined as the reference model, and Ī”BIC scores were calculated for each model as the difference from reference model iBIC.

Parameter estimate distributions

Across groups, and within each condition (money or addictive cue) and model type (decision or craving), we identified the best-performing model by the Ī”BIC score. For this model, we plotted the distributions of parameter estimates across participants for all parameters utilized in the model. We visualized decision-making parameters and craving parameters in separate sub-figures for easier summarization and interpretation. For each parameter, we tested the directionality of the estimated effect by calculating statistical significance from zero. All significance testing was performed with a parametric t test (either independent or relative, depending on suitability of the samples) and confirmed with non-parametric Mann–Whitney U tests.

Clinical score prediction

We used a multiple linear regression model to test the hypothesis that joint parameter estimates from best-performing models can successfully predict clinical risk scores. For clinical risk scores, we decided to use ASSIST, a group-specific survey that had high variability across participants. The best-performing decision-making and craving models identified by the model comparison procedure, restricted to the more salient addictive cue condition, was tested in six classes of regression analysis (Supplementary Table 2). In the Demo-only, the participant-specific demographic information (age, sex, education level, self-reported race and income) was used as regressors. In Comp-only, decision- and craving-model parameter estimates were used as regressors. In Agnostic-only, simple means and variances of craving and choice optimality were used to predict clinical risk. In the last three models, demographic regressors were combined with computational regressors (Demo + comp), agnostic regressors (Demo + agnostic) and both (Demo + comp + agnostic). Models were implemented using ā€˜bambi’, a Bayesian linear modeling Python library97. Final reported models represent the best-performing (by Akaike information criterion and elpd_waic) models. Variables not contributing significantly to prediction, demonstrated by large variation in weight estimates, were omitted (for example, race and sex in the cannabis group).

Ethics statement

Ethical approval was obtained for each individual’s behavioral data. Patients provided informed consent as part of their participation in the original experiments, which included agreement for data sharing for research purposes.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.