Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Predicting human decisions with behavioural theories and machine learning

Abstract

Predicting human decisions under risk and uncertainty remains a fundamental challenge across disciplines. Existing models often struggle even in highly stylized tasks like choice between lotteries. Here we introduce BEAST gradient boosting (BEAST-GB), a hybrid model integrating behavioural theory (BEAST) with machine learning. We first present CPC18, a competition for predicting risky choice, in which BEAST-GB won. Then, using two large datasets, we demonstrate that BEAST-GB predicts more accurately than neural networks trained on extensive data and dozens of existing behavioural models. BEAST-GB also generalizes robustly across unseen experimental contexts, surpassing direct empirical generalization, and helps to refine and improve the behavioural theory itself. Our analyses highlight the potential of anchoring predictions on behavioural theory even in data-rich settings and even when the theory alone falters. Our results underscore how integrating machine learning with theoretical frameworks, especially those—like BEAST—designed for prediction, can improve our ability to predict and understand human behaviour.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Example decision-making task used in CPC18.
Fig. 2: Feature importance analyses for CPC18 data.
Fig. 3: Test-set performance on Choices13k data.
Fig. 4: Test-set performance on HAB22 data.
Fig. 5: Predictive accuracy in context generalization task.

Similar content being viewed by others

Data availability

Raw data for CPC18, as well as processed data for analyses of the previously published datasets (Choices13k and HAB22), are publicly available at https://doi.org/10.17605/OSF.IO/VW2SU.

Code availability

Code for all models and analyses reported in this study is publicly available at https://doi.org/10.17605/OSF.IO/VW2SU.

References

  1. Bernoulli, D. Exposition of a new theory on the measurement of risk (original 1738). Econometrica 22, 23–36 (1954).

    Article  Google Scholar 

  2. Tversky, A. & Kahneman, D. Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 5, 297–323 (1992).

    Article  Google Scholar 

  3. Kahneman, D. & Tversky, A. Prospect theory: an analysis of decision under risk. Econometrica 47, 263–292 (1979).

    Article  Google Scholar 

  4. von Neumann, J. & Morgenstern, O. Theory of Games and Economic Behavior (Princeton Univ. Press, 1947).

  5. Erev, I., Ert, E., Plonsky, O., Cohen, D. & Cohen, O. From anomalies to forecasts: toward a descriptive model of decisions under risk, under ambiguity, and from experience. Psychol. Rev. 124, 369–409 (2017).

    Article  PubMed  Google Scholar 

  6. He, L., Analytis, P. P. & Bhatia, S. The wisdom of model crowds. Manag. Sci. 68, 3635–3659 (2022).

    Article  Google Scholar 

  7. Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale experiments and machine learning to discover theories of human decision-making. Science 372, 1209–1214 (2021).

    Article  CAS  PubMed  Google Scholar 

  8. Altman, A., Bercovici-Boden, A. & Tennenholtz, M. Learning in one-shot strategic form games. In European Conference on Machine Learning (eds Fürnkranz, J. et al.) 6–17 (Springer, 2006).

  9. Hartford, J. S., Wright, J. R. & Leyton-Brown, K. Deep learning for predicting human strategic behavior. In Advances in Neural Information Processing Systems (eds Lee, D. et al.) 2424–2432 (2016).

  10. Halevy, A., Norvig, P. & Pereira, F. The unreasonable effectiveness of data. IEEE Intell. Syst. 24, 8–12 (2009).

    Article  Google Scholar 

  11. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484 (2016).

    Article  CAS  PubMed  Google Scholar 

  12. Peysakhovich, A. & Naecker, J. Using methods from machine learning to evaluate behavioral models of choice under risk and ambiguity. J. Econ. Behav. Organ. 133, 373–384 (2017).

    Article  Google Scholar 

  13. Fudenberg, D. & Liang, A. Predicting and understanding initial play. Am. Econ. Rev. 109, 4112–4141 (2019).

    Article  Google Scholar 

  14. Fudenberg, D., Kleinberg, J., Liang, A. & Mullainathan, S. Measuring the completeness of economic models. J. Polit. Econ. 130, 956–990 (2022).

    Article  Google Scholar 

  15. Agrawal, M., Peterson, J. C. & Griffiths, T. L. Scaling up psychology via scientific regret minimization. Proc. Natl Acad. Sci. USA 117, 8825–8835 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Plonsky, O., Erev, I., Hazan, T. & Tennenholtz, M. Psychological forest: predicting human behavior. In The Thirty-FirstAAAI Conference on Artificial Intelligence Vol. 31, 656–662 (AAAI Press, 2017).

  17. Bourgin, D. D., Peterson, J. C., Reichman, D., Russell, S. J. & Griffiths, T. L. Cognitive model priors for predicting human decisions. In International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 5133–5141 (PMLR, 2019).

  18. Plonsky, O., Apel, R., Erev, I., Ert, E. & Tennenholtz, M. When and how can social scientists add value to data scientists? A choice prediction competition for human decision making. Open Science Framework https://doi.org/10.17605/OSF.IO/2X3VT (2018).

  19. Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).

  20. Zhou, K., Liu, Z., Qiao, Y., Xiang, T. & Loy, C. C. Domain generalization: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 45, 4396–4415 (2022).

    Google Scholar 

  21. Savage, L. J. The Foundations of Statistics (John Wiley & Sons, 1954).

  22. Allais, M. Le comportement de l’homme rationnel devant le risque: critique des postulats et axiomes de l’école américaine. Econom. J. Econom. Soc. 21, 503–546 (1953).

    Google Scholar 

  23. Ellsberg, D. Risk, ambiguity, and the Savage axioms. Q. J. Econ. 75, 643–669 (1961).

    Article  Google Scholar 

  24. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  25. Dawes, R. M., Faust, D. & Meehl, P. E. Clinical versus actuarial judgment. Science 243, 1668–1674 (1989).

    Article  CAS  PubMed  Google Scholar 

  26. Einhorn, H. J. Expert measurement and mechanical combination. Organ. Behav. Hum. Perform. 7, 86–106 (1972).

    Article  Google Scholar 

  27. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30, 4765–4774 (2017).

    Google Scholar 

  28. Thomas, T. et al. Modelling dataset bias in machine-learned theories of economic decision-making. Nat. Hum. Behav. https://doi.org/10.1038/s41562-023-01784-6 (2024).

  29. Shoshan, V., Hazan, T. & Plonsky, O. BEAST-Net: Learning novel behavioral insights using a neural network adaptation of a behavioral model. Open Science Framework https://osf.io/kaeny/ (2023).

  30. Stewart, N., Reimers, S. & Harris, A. J. L. On the origin of utility, weighting, and discounting functions: how they get their shapes and how to change their shapes. Manag. Sci. 61, 687–705 (2015).

    Article  Google Scholar 

  31. Spektor, M. S., Bhatia, S. & Gluth, S. The elusiveness of context effects in decision making. Trends Cogn. Sci. 25, 843–854 (2021).

    Article  PubMed  Google Scholar 

  32. Heilprin, E. & Erev, I. The relative importance of the contrast and assimilation effects in decisions under risk. J. Behav. Decis. Mak. 37, e2408 (2024).

    Article  Google Scholar 

  33. Blanchard, G., Deshmukh, A. A., Dogan, U., Lee, G. & Scott, C. Domain generalization by marginal transfer learning. J. Mach. Learn. Res. 22, 1–55 (2021).

    Google Scholar 

  34. Andrews, I., Fudenberg, D., Liang, A. & Wu, C. The transfer performance of economic models. Preprint at https://arxiv.org/abs/2202.04796 (2022).

  35. Dwork, C. et al. The reusable holdout: preserving validity in adaptive data analysis. Science 349, 636–638 (2015).

    Article  CAS  PubMed  Google Scholar 

  36. Hofman, J. M. et al. Integrating explanation and prediction in computational social science. Nature 595, 181–188 (2021).

    Article  CAS  PubMed  Google Scholar 

  37. Agassi, O. D. & Plonsky, O. The importance of non-analytic models in decision making research: an empirical analysis using BEAST. In Proc. Annual Meeting of the Cognitive Science Society (eds Goldwater, M. et al.) 45 (2023).

  38. Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12, 1100–1122 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Shafir, S., Reich, T., Tsur, E., Erev, I. & Lotem, A. Perceptual accuracy and conflicting effects of certainty on risk-taking behaviour. Nature 453, 917–920 (2008).

    Article  CAS  PubMed  Google Scholar 

  40. Weber, E. U., Shafir, S. & Blais, A.-R. Predicting risk sensitivity in humans and lower animals: risk as variance or coefficient of variation. Psychol. Rev. 111, 430 (2004).

    Article  PubMed  Google Scholar 

  41. Plonsky, O. & Erev, I. Prediction oriented behavioral research and its relationship to classical decision research. Open Science Framework https://doi.org/10.31234/osf.io/7uha4 (2021).

  42. d’Eon, G., Greenwood, S., Leyton-Brown, K. & Wright, J. R. How to evaluate behavioral models. In AAAI Conference on Artificial Intelligence Vol. 38, 9636–9644 (AAAI Press, 2024).

  43. Agassi, O. D. & Plonsky, O. Beyond analytic bounds: re-evaluating predictive power in risky decision models. Judgm. Decis. Mak. 19, e35 (2024).

    Article  Google Scholar 

  44. Erev, I., Ert, E., Plonsky, O. & Roth, Y. Contradictory deviations from maximization: environment-specific biases, or reflections of basic properties of human learning? Psychol. Rev. 130, 640–676 (2023).

    Article  PubMed  Google Scholar 

  45. Plonsky, O., Teodorescu, K. & Erev, I. Reliance on small samples, the wavy recency effect, and similarity-based learning. Psychol. Rev. 122, 621–647 (2015).

    Article  PubMed  Google Scholar 

  46. Erev, I. & Marx, A. Humans as intuitive classifiers. Front. Psychol. 13, 1041737 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Liu, Y. & Just, A. SHAPforxgboost: SHAP Plots for ‘XGBoost’. R Package Version 0.1.3 (CRAN, 2023).

  48. Ert, E. & Erev, I. On the descriptive value of loss aversion in decisions under risk: six clarifications. Judgm. Decis. Mak. 8, 214–235 (2013).

    Article  Google Scholar 

  49. Thaler, R. H. & Johnson, E. J. Gambling with the house money and trying to break even: the effects of prior outcomes on risky choice. Manag. Sci. 36, 643–660 (1990).

    Article  Google Scholar 

  50. Payne, J. W. It is whether you win or lose: the importance of the overall probabilities of winning or losing in risky choice. J. Risk Uncertain. 30, 5–19 (2005).

    Article  Google Scholar 

  51. Birnbaum, M. H. New paradoxes of risky decision making. Psychol. Rev. 115, 463–501 (2008).

    Article  PubMed  Google Scholar 

  52. Barron, G. & Erev, I. Small feedback-based decisions and their limited correspondence to description-based decisions. J. Behav. Decis. Mak. 16, 215–233 (2003).

    Article  Google Scholar 

  53. Busemeyer, J. R. & Townsend, J. T. Decision field theory: a dynamic-cognitive approach to decision making in an uncertain environment. Psychol. Rev. 100, 432–459 (1993).

    Article  CAS  PubMed  Google Scholar 

  54. Diederich, A. & Busemeyer, J. R. Conflict and the stochastic-dominance principle of decision making. Psychol. Sci. 10, 353–359 (1999).

    Article  Google Scholar 

  55. Canty, A. & Ripley, B. boot: Bootstrap R (S-Plus) Functions. R Package Version 1.3-28.1 (CRAN, 2022).

  56. Brandstätter, E., Gigerenzer, G. & Hertwig, R. The priority heuristic: making choices without trade-offs. Psychol. Rev. 113, 409–432 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Stewart, N., Chater, N. & Brown, G. D. A. Decision by sampling. Cogn. Psychol. 53, 1–26 (2006).

    Article  PubMed  Google Scholar 

  58. Fiedler, S. & Glöckner, A. The dynamics of decision making in risky choice: an eye-tracking analysis. Front. Psychol. 3, 335 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Rieskamp, J. The probabilistic nature of preferential choice. J. Exp. Psychol. Learn. Mem. Cogn. 34, 1446–1465 (2008).

    Article  PubMed  Google Scholar 

  60. Stewart, N., Hermens, F. & Matthews, W. J. Eye movements in risky choice. J. Behav. Decis. Mak. 29, 116–136 (2016).

    Article  PubMed  Google Scholar 

  61. Pachur, T., Schulte-Mecklenbeck, M., Murphy, R. O. & Hertwig, R. Prospect theory reflects selective allocation of attention. J. Exp. Psychol. Gen. 147, 147–169 (2018).

    Article  PubMed  Google Scholar 

  62. Pachur, T., Mata, R. & Hertwig, R. Who dares, who errs? Disentangling cognitive and motivational roots of age differences in decisions under risk. Psychol. Sci. 28, 504–518 (2017).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

O.P. thanks O. D. Agassi for help in analysis of some of the curated data. I.E. acknowledges support from the Israel Science Foundation (grant no. 1821/12). M.T. has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement number 740435). D.B., J.C.P., D.R. and T.L.G. have received funding from DARPA (cooperative agreement D17AC00004) and the United States National Science Foundation (grant number 1718550). The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

O.P., R.A., E.E., M.T. and I.E. organized CPC18 (O.P., E.E. and I.E. designed the experiments and collected the experimental data; I.E. developed the first baseline model; O.P., R.A. and I.E. programmed the baseline models; O.P. and E.E. managed submissions). D.B., J.C.P., D.R., T.L.G. and S.J.R. submitted the winning model for the first track of CPC18. E.C.C. and J.F.C. submitted the winning model for the second track of CPC18. O.P. performed all post-competition analyses, including analyses of Choices13k and HAB22. O.P. wrote the manuscript, and all authors commented on it.

Corresponding author

Correspondence to Ori Plonsky.

Ethics declarations

Competing interests

One of the authors (D.B.) is affiliated with Adobe Research, but his work on this project was done almost exclusively before he had this affiliation. Adobe Research had no role in this project. We declare no other competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Pantelis Analytis and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparison of the usefulness of behavioral models as foresight features in CPC18.

In each case, we tuned and trained an XGB algorithm using only the objective features (see Table 1) and the prediction of each foresight on CPC18’s training data and predicted its test data. Data used was restricted to the subset of CPC18’s data that reflects pure decisions under risk (no feedback or ambiguity), implying training on 182 tasks, and testing on 48 tasks. All behavioral models except BEAST were first fitted to the training data independently to provide predictions. BEAST’s (red) predictions used the original parameters from CPC15 (Erev et al., 2017). Ensemble of foresights (rosy-brown) uses all five foresights combined. Bars show the single held-out test-set Mean Squared Error (MSE) per model. Completeness (see Methods) computed relative to a naïve baseline (MSE = 0.05095) and irreducible noise limit (MSE = 0.00113).

Extended Data Fig. 2 Feature importance analyses for Choices13k data.

(a) Test set performance on Choices13k data when removing different sets of features from BEAST-GB. Data was split to 90% training (8848 tasks) and 10% held-out test data (983 tasks), and models were trained on fixed and increasing proportions of the training data. This process was repeated 50 times, and results reflect the average test set MSE over the n = 50 train-test splits. (b) Average absolute SHAP values of BEAST-GB’s features in predicting Choices13k test data, by feature category. “Δ Min payoffs” is both a Naïve and a Psychological feature. For clarity, only top 20 features are shown. Feature names and definitions appear in Table 1.

Extended Data Fig. 3 2D visualization of all 11,666 choice tasks used in this paper.

Each point is a single choice task represented in two dimensions obtained by implementing a t-SNE (t-distributed Stochastic Neighbor Embedding) algorithm on the set of psychological features of each task (see Table 1). Tasks depicted closer together are conceptually more similar than tasks further apart (though the values of the dimensions do not have direct interpretations). Choices13k data appears to cover well the space from which CPC18 data comes from, whereas HAB22 data is different than both.

Extended Data Fig. 4 Feature importance analyses for HAB22 data.

(a) HAB22 test set predictive performance of BEAST-GB (red) and variations of it that remove different feature sets. Bars show mean test-set MSE across the 50 nested cross validation folds (n = 50 fold-MSE values). Grey dots form a horizontal dot-histogram of the n = 50 fold-level MSEs (bin = 0.0005) for each model. Completeness (see Methods) computed relative to a naïve baseline (average MSE = 0.1314) and irreducible noise limit (average MSE = 0.0248), in each fold separately, then averaged. (b) Average absolute SHAP values of BEAST-GB’s features in predicting HAB22’s test set, by feature category. “Δ Min payoffs” is both a Naïve and a Psychological feature. For clarity, only top 20 features are shown. Feature names and definitions in Table 1.

Supplementary information

Supplementary Information

Supplementary text, Fig. 1 and Tables 1–3.

Reporting Summary

Peer Review File

Supplementary Data 1

Trial-by-trial individual raw choice data for experiments 1 and 2 described in the Article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Plonsky, O., Apel, R., Ert, E. et al. Predicting human decisions with behavioural theories and machine learning. Nat Hum Behav 9, 2271–2284 (2025). https://doi.org/10.1038/s41562-025-02267-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41562-025-02267-6

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing