Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Generalizability of choice architecture interventions

Abstract

Although a given choice architecture intervention (a ‘nudge’) can be highly effective in some conditions, it can be ineffective or counterproductive in others. Critically, researchers and practitioners cannot reliably predict which of these outcomes will happen on the basis of current knowledge. In this Review, we present evidence that the average effectiveness of choice architecture interventions on behaviour is smaller than often reported and that there is substantial heterogeneity in their effects. We outline the obstacles to understanding the generalizability of these effects, such as the complex interaction of moderators and their changes over time. We then clarify dimensions of generalizability and research practices (including systematic exploration of the moderators and practices designed to enhance generalizability) that could enable evidence on generalizability to be gathered more efficiently. These practices are essential for advancing nuanced theories of behaviour change and for more accurately predicting the effectiveness of choice architecture interventions across diverse populations, settings, treatments, outputs and analytical approaches.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Dimensions of generalizability.
Fig. 2: Learning about generalizability across the research process.

Similar content being viewed by others

References

  1. Thaler, R. H. & Sunstein, C. R. Nudge: Improving Decisions About Health, Wealth, and Happiness (Yale Univ. Press, 2008).

  2. Michie, S., Van Stralen, M. M. & West, R. The behaviour change wheel: a new method for characterising and designing behaviour change interventions. Implem. Sci. 6, 42 (2011).

    Article  Google Scholar 

  3. Halpern, D. Inside the Nudge Unit: How Small Changes Can Make a Big Difference (Random House, 2015).

  4. Benartzi, S. et al. Should governments invest more in nudging? Psychol. Sci. 28, 1041–1055 (2017).

    Article  Google Scholar 

  5. Fishbane, A., Ouss, A. & Shah, A. K. Behavioral nudges reduce failure to appear for court. Science 370, eabb6591 (2020).

    Article  Google Scholar 

  6. Allcott, H. & Rogers, T. The short-run and long-run effects of behavioral interventions: experimental evidence from energy conservation. Am. Econ. Rev. 104, 3003–3037 (2014).

    Article  Google Scholar 

  7. Allcott, H. Site selection bias in program evaluation. Q. J. Econ. 130, 1117–1165 (2015).

    Article  Google Scholar 

  8. Hallsworth, M., List, J. A., Metcalfe, R. D. & Vlaev, I. The behavioralist as tax collector: using natural field experiments to enhance tax compliance. J. Public. Econ. 148, 14–31 (2017).

    Article  Google Scholar 

  9. Frost, P. et al. The influence of confirmation bias on memory and source monitoring. J. Gen. Psychol. 142, 238–252 (2015).

    Article  Google Scholar 

  10. Sterling, T. D. Publication decisions and their possible effects on inferences drawn from tests of significance — or vice versa. J. Am. Stat. Assoc. 54, 30–34 (1959).

    Google Scholar 

  11. Jachimowicz, J. M., Hauser, O. P., O’Brien, J. D., Sherman, E. & Galinsky, A. D. The critical role of second-order normative beliefs in predicting energy conservation. Nat. Hum. Behav. 2, 757–764 (2018).

    Article  Google Scholar 

  12. Hallsworth, M. A manifesto for applying behavioural science. Nat. Hum. Behav. 7, 310–322 (2023).

    Article  Google Scholar 

  13. Straßheim, H. The rise and spread of behavioral public policy: an opportunity for critical research and self-reflection. Int. Rev. Public Policy 2, 115–128 (2020).

    Article  Google Scholar 

  14. Cartwright, N. & Hardie, J. Evidence-Based Policy: A Practical Guide to Doing It Better (Oxford Univ. Press, 2012).

  15. Yarkoni, T. The generalizability crisis. Behav. Brain Sci. 45, 102723 (2022).

    Google Scholar 

  16. Szaszi, B., Palinkas, A., Palfi, B., Szollosi, A. & Aczel, B. A systematic scoping review of the choice architecture movement: toward understanding when and why nudges work. J. Behav. Decis. Mak. 31, 355–366 (2018). A systematic review that examines the choice architecture movement, summarizing how to improve the evidence synthesis of nudge interventions.

    Article  Google Scholar 

  17. Bryan, C. J., Tipton, E. & Yeager, D. S. Behavioural science is unlikely to change the world without a heterogeneity revolution. Nat. Hum. Behav. 5, 980–989 (2021). A critical review that argues that behavioural science can create meaningful change only by rigorously addressing heterogeneity in research findings.

    Article  Google Scholar 

  18. Osman, M. et al. Learning from behavioural changes that fail. Trends Cogn. Sci. 24, 969–980 (2020).

    Article  Google Scholar 

  19. Milkman, K. L. et al. Megastudies improve the impact of applied behavioural science. Nature 600, 478–483 (2021).

    Article  Google Scholar 

  20. Cartwright, N. Middle-range theory: without it what could anyone do? Theor. Int. J. Theory Hist. Found. Sci. 35, 269–323 (2020).

    Article  Google Scholar 

  21. Findley, M. G., Kikuta, K. & Denly, M. External validity. Annu. Rev. Polit. Sci. 24, 365–393 (2021). A review of external validity in research that provides a framework for understanding how research findings can be applied beyond specific study contexts.

    Article  Google Scholar 

  22. Vazire, S., Schiavone, S. R. & Bottesini, J. G. Credibility beyond replicability: improving the four validities in psychological science. Curr. Dir. Psychol. Sci. 31, 162–168 (2022).

    Article  Google Scholar 

  23. Lesko, C. R., Ackerman, B., Webster-Clark, M. & Edwards, J. K. Target validity: bringing treatment of external validity in line with internal validity. Curr. Epidemiol. Rep. 7, 117–124 (2020).

    Article  Google Scholar 

  24. Marcellesi, A. External validity: is there still a problem? Phil. Sci. 82, 1308–1317 (2015).

    Article  Google Scholar 

  25. Bareinboim, E. & Pearl, J. A general algorithm for deciding transportability of experimental results. J. Causal Infer. 1, 107–134 (2013).

    Article  Google Scholar 

  26. Campbell, D. T. Factors relevant to the validity of experiments in social settings. Psychol. Bull. 54, 297–312 (1957).

    Article  Google Scholar 

  27. Almaatouq, A. et al. Beyond playing 20 questions with nature: integrative experiment design in the social and behavioral sciences. Behav. Brain Sci. 47, e33 (2024). A review study that proposes a new, more integrative experimental design in social and behavioural sciences, moving beyond simplistic hypothesis testing.

    Article  Google Scholar 

  28. Moeller, J. et al. Generalizability crisis meets heterogeneity revolution: determining under which boundary conditions findings replicate and generalize. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/5wsna (2022).

  29. Schloss, P. D. Identifying and overcoming threats to reproducibility, replicability, robustness, and generalizability in microbiome research. mBio 9, 00525-18 (2018).

    Article  Google Scholar 

  30. Whitaker, K. Publishing a reproducible paper. Open Science in Practice Summer School https://figshare.com/articles/presentation/Publishing_a_reproducible_paper/5440621?file=9408697 (2017).

  31. Pearl, J. & Bareinboim, E. in Probabilistic and Causal Inference: The Works of Judea Pearl (eds Geffner, H. et al.) 451–482 (Association for Computing Machinery, 2022).

  32. Hummel, D. & Maedche, A. How effective is nudging? A quantitative review on the effect sizes and limits of empirical nudging studies. J. Behav. Exp. Econ. 80, 47–58 (2019).

    Article  Google Scholar 

  33. DellaVigna, S. & Linos, E. RCTs to scale: comprehensive evidence from two nudge units. Econometrica 90, 81–116 (2022). A comprehensive analysis of randomized controlled trials from two nudge units that provides large-scale evidence about intervention effectiveness.

    Article  Google Scholar 

  34. Mertens, S., Herberz, M., Hahnel, U. J. & Brosch, T. The effectiveness of nudging: a meta-analysis of choice architecture interventions across behavioral domains. Proc. Natl Acad. Sci. 119, e2107346118 (2022).

    Article  Google Scholar 

  35. Maier, M. et al. No evidence for nudging after adjusting for publication bias. Proc. Natl Acad. Sci. 119, e2200300119 (2022).

    Article  Google Scholar 

  36. Szaszi, B. et al. No reason to expect large and consistent effects of nudge interventions. Proc. Natl Acad. Sci. 119, e2200732119 (2022).

    Article  Google Scholar 

  37. Meehl, P. E. Theory-testing in psychology and physics: a methodological paradox. Phil. Sci. 34, 103–115 (1967).

    Article  Google Scholar 

  38. Gelman, A. Causality and statistical learning. Am. J. Sociol. 117, 955–966 (2011).

    Article  Google Scholar 

  39. Tosh, C., Greengard, P., Goodrich, B., Gelman, A. & Hsu, D. The piranha problem: large effects swimming in a small pond. Preprint at arXiv https://doi.org/10.48550/arXiv.2105.13445 (2024).

  40. Bartoš, F. et al. Meta-analyses in psychology often overestimate evidence for and size of effects. R. Soc. Open. Sci. 10, 230224 (2023).

    Article  Google Scholar 

  41. Borenstein, M., Hedges, L. V., Higgins, J. P. & Rothstein, H. R. Introduction to Meta-Analysis (John Wiley & Sons, 2011).

  42. Rosenthal, R. & Gaito, J. Further evidence for the cliff effect in interpretation of levels of significance. Psychol. Rep. 15, 570 (1964).

    Article  Google Scholar 

  43. Simonsohn, U., Simmons, J. & Nelson, L. Meaningless means #2: the average effect of nudging in academic publications is 8.7%. DataColada http://www.datacolada.ort/106 (2022).

  44. Mertens, S., Herberz, M., Hahnel, U. J. J. & Brosch, T. Reply to Maier et al., Szaszi et al., and Bakdash and Marusich: the present and future of choice architecture research. Proc. Natl Acad. Sci. 119, e2202928119 (2022).

    Article  Google Scholar 

  45. Bakdash, J. Z. & Marusich, L. R. Left-truncated effects and overestimated meta-analytic means. Proc. Natl Acad. Sci. 119, e2203616119 (2022).

    Article  Google Scholar 

  46. Banerjee, A. & Urminsky, O. The language that drives engagement: a systematic large-scale analysis of headline experiments. Mark. Sci. 44, 491–732 (2024).

    Google Scholar 

  47. Simonsohn, U. ‘The’ effect size does not exist. DataColada https://datacolada.org/33 (2015).

  48. Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).

    Article  Google Scholar 

  49. Parsons, S. et al. A community-sourced glossary of open scholarship terms. Nat. Hum. Behav. 6, 312–318 (2022).

    Article  Google Scholar 

  50. Nosek, B. A., Ebersole, C. R., DeHaven, A. C. & Mellor, D. T. The preregistration revolution. Proc. Natl Acad. Sci. 115, 2600–2606 (2018).

    Article  Google Scholar 

  51. Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. & Kievit, R. A. An agenda for purely confirmatory research. Persp. Psychol. Sci. 7, 632–638 (2012).

    Article  Google Scholar 

  52. Schäfer, T. & Schwarz, M. The meaningfulness of effect sizes in psychological research: differences between sub-disciplines and the impact of potential biases. Front. Psychol. 10, 813 (2019).

    Article  Google Scholar 

  53. Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).

    Article  Google Scholar 

  54. Camerer, C. F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat. Hum. Behav. 2, 637–644 (2018).

    Article  Google Scholar 

  55. Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016).

    Article  Google Scholar 

  56. Errington, T. M. et al. Investigating the replicability of preclinical cancer biology. eLife 10, e71601 (2021).

    Article  Google Scholar 

  57. Shu, L. L., Mazar, N., Gino, F., Ariely, D. & Bazerman, M. H. Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end. Proc. Natl Acad. Sci. 109, 15197–15200 (2012).

    Article  Google Scholar 

  58. Stricker, J. & Günther, A. Scientific misconduct in psychology. Z. Psychol. 227, 53–63 (2019).

    Google Scholar 

  59. Simmons, J., Nelson, L. & Simonsohn, U. Meaningless means #1: the average effect of nudging is d = 0.43. DataColada https://datacolada.org/105 (2022).

  60. Simonsohn, U., Simmons, J. & Nelson, L. D. Above averaging in literature reviews. Nat. Rev. Psychol. 1, 551–552 (2022).

    Article  Google Scholar 

  61. Atkins, L. et al. A guide to using the Theoretical Domains Framework of behaviour change to investigate implementation problems. Implem. Sci. 12, 77 (2017).

    Article  Google Scholar 

  62. Yang, S. et al. The elements of context. Rotman School of Management University of Toronto https://www.rotman.utoronto.ca/media/rotman/content-assets/images/areas/bear/white-papers-pdf-printable/The-Elements-of-Context---BEAR-Research-Report-Series---20July2023-(1).pdf (2023).

  63. Mažar, N. & Soman, D. Behavioral Science in the Wild (Univ. Toronto Press, 2022).

  64. Ning, Z., Xin, L. I. U., Shu, L. I. & Rui, Z. Nudging effect of default options: a meta-analysis. Adv. Psychol. Sci. 30, 1230 (2022).

    Article  Google Scholar 

  65. Johnson, E. J. & Goldstein, D. G. Do defaults save lives? Science 302, 1338–1339 (2003).

    Article  Google Scholar 

  66. Krefeld-Schwalb, A., Sugerman, E. R. & Johnson, E. J. Exposing omitted moderators: explaining why effect sizes differ in the social sciences. Proc. Natl Acad. Sci. 121, e2306281121 (2024).

    Article  Google Scholar 

  67. Jachimowicz, J. M., Duncan, S., Weber, E. U. & Johnson, E. J. When and why defaults influence decisions: a meta-analysis of default effects. Behav. Public Policy 3, 159–186 (2019).

    Article  Google Scholar 

  68. Narula, T., Ramprasad, C., Ruggs, E. N. & Hebl, M. R. Increasing colonoscopies? A psychological perspective on opting in versus opting out. Health Psychol. 33, 1426 (2014).

    Article  Google Scholar 

  69. Klein, R. A. et al. Many Labs 2: investigating variation in replicability across samples and settings. Adv. Meth. Pract. Psychol. Sci. 1, 443–490 (2018).

    Article  Google Scholar 

  70. Moshontz, H. et al. The psychological science accelerator: advancing psychology through a distributed collaborative network. Adv. Meth. Pract. Psychol. Sci. 1, 501–515 (2018).

    Article  Google Scholar 

  71. Van Bavel, J. J., Mende-Siedlecki, P., Brady, W. J. & Reinero, D. A. Contextual sensitivity in scientific reproducibility. Proc. Natl Acad. Sci. 113, 6454–6459 (2016).

    Article  Google Scholar 

  72. Vonasch, A. J. et al. “Less Is Better” in separate evaluations versus “More Is Better” in joint evaluations: mostly successful close replication and extension of Hsee (1998). Coll. Psychol. 9, 77859 (2023).

    Article  Google Scholar 

  73. Imada, H. et al. Rewarding more is better for soliciting help, yet more so for cash than for goods: revisiting and reframing the tale of two markets with replications and extensions of Heyman and Ariely (2004). Coll. Psychol. 8, 32572 (2022).

    Article  Google Scholar 

  74. Ziano, I. et al. Numbing or sensitization? replications and extensions of Fetherstonhaugh et al. (1997)’s “Insensitivity to the Value of Human Life”. J. Exp. Soc. Psychol. 97, 104222 (2021).

    Article  Google Scholar 

  75. Feldman, G. Collaborative Open-science and meta REsearch (CORE) Team. OSFHOME https://doi.org/10.17605/OSF.IO/5Z4A8 (2025).

  76. Hall, J. D. & Madsen, J. M. Can behavioral interventions be too salient? Evidence from traffic safety messages. Science 376, eabm3427 (2022).

    Article  Google Scholar 

  77. Singer, P. Animal Liberation (Harper Collins, 1975).

  78. Orth, T. The ethics of eating animals: which factors influence Americans’ views? YouGov https://today.yougov.com/health/articles/45577-ethics-eating-animals-which-factors-matter-poll (2023).

  79. Delios, A. et al. Examining the generalizability of research findings from archival data. Proc. Natl Acad. Sci. 119, e2120377119 (2022).

    Article  Google Scholar 

  80. Tipton, E. How generalizable is your experiment? An index for comparing experimental samples and populations. J. Educ. Behav. Stat. 39, 478–501 (2014).

    Article  Google Scholar 

  81. Camerer, C. The promise and success of lab-field generalizability in experimental economics: a critical reply to Levitt and List. Preprint at SSRN https://doi.org/10.2139/ssrn.1977749 (2011).

  82. Pearl, J. Generalizing experimental findings. J. Causal Infer. 3, 259–266 (2015).

    Article  Google Scholar 

  83. Henrich, J., Heine, S. J. & Norenzayan, A. Beyond WEIRD: towards a broad-based behavioral science. Behav. Brain Sci. 33, 111–135 (2010).

    Article  Google Scholar 

  84. Roberts, S. O., Bareket-Shavit, C., Dollins, F. A., Goldie, P. D. & Mortenson, E. Racial inequality in psychological research: trends of the past and recommendations for the future. Persp. Psychol. Sci. 15, 1295–1309 (2020).

    Article  Google Scholar 

  85. Mortensen, K. & Hughes, T. L. Comparing Amazon’s Mechanical Turk platform to conventional data collection methods in the health and medical research literature. J. Gen. Intern. Med. 33, 533–538 (2018).

    Article  Google Scholar 

  86. Yeager, D. S., Krosnick, J. A., Visser, P. S., Holbrook, A. L. & Tahk, A. M. Moderation of classic social psychological effects by demographics in the US adult population: new opportunities for theoretical advancement. J. Pers. Soc. Psychol. 117, e84 (2019).

    Article  Google Scholar 

  87. Landy, J. F. et al. Crowdsourcing hypothesis tests: making transparent how design choices shape research results. Psychol. Bull. 146, 451–479 (2020). This methodological study demonstrates how research design choices fundamentally shape research results.

    Article  Google Scholar 

  88. Huber, C. et al. Competition and moral behavior: a meta-analysis of forty-five crowd-sourced experimental designs. Proc. Natl Acad. Sci. 120, e2215572120 (2023).

    Article  Google Scholar 

  89. Clark, H. H. The language-as-fixed-effect fallacy: a critique of language statistics in psychological research. J. Verbal Learn. Verbal Behav. 12, 335–359 (1973).

    Article  Google Scholar 

  90. Wells, G. L. & Windschitl, P. D. Stimulus sampling and social psychological experimentation. Pers. Soc. Psychol. Bull. 25, 1115–1125 (1999).

    Article  Google Scholar 

  91. Judd, C. M., Westfall, J. & Kenny, D. A. Treating stimuli as a random factor in social psychology: a new and comprehensive solution to a pervasive but largely ignored problem. J. Pers. Soc. Psychol. 103, 54–69 (2012).

    Article  Google Scholar 

  92. Capraro, V., Di Paolo, R., Perc, M. & Pizziol, V. Language-based game theory in the age of artificial intelligence. J. R. Soc. Interf. 21, 20230720 (2024).

    Article  Google Scholar 

  93. Capraro, V., Halpern, J. Y. & Perc, M. From outcome-based to language-based preferences. J. Econ. Lit. 62, 115–154 (2024).

    Article  Google Scholar 

  94. Goldstein, D. G. in The Behavioral Economics Guide 2022 (ed. Samson, A.) 6–11 (Behavioraleconomics.com, 2022).

  95. Lejarraga, T. & Hertwig, R. How experimental methods shaped views on human competence and rationality. Psychol. Bull. 147, 535–564 (2021).

    Article  Google Scholar 

  96. Doerrenberg, P. & Schmitz, J. Tax compliance and information provision: a field experiment with small firms. ECONSTOR https://www.econstor.eu/handle/10419/110751 (2015).

  97. Carey, R. N. et al. Describing the ‘how’ of behaviour change interventions: a taxonomy of modes of delivery. Eur. Health. Psychol. https://ehps.net/ehp/index.php/contents/article/view/2687 (2017).

  98. Webb, T. L. & Sheeran, P. Does changing behavioral intentions engender behavior change? A meta-analysis of the experimental evidence. Psychol. Bull. 132, 249 (2006).

    Article  Google Scholar 

  99. Osman, M. Backfiring, reactance, boomerang, spillovers, and rebound effects: can we learn anything from examples where nudges do the opposite of what they intended? Preprint at PsyArXiv https://doi.org/10.31234/osf.io/ae756 (2020).

  100. Carpena, F., Cole, S., Shapiro, J. & Zia, B. The ABCs of financial education: experimental evidence on attitudes, behavior, and cognitive biases. Manag. Sci. 65, 346–369 (2019).

    Article  Google Scholar 

  101. Aczel, B. et al. Consensus-based guidance for conducting and reporting multi-analyst studies. eLife 10, e72185 (2021).

    Article  Google Scholar 

  102. Steegen, S., Tuerlinckx, F., Gelman, A. & Vanpaemel, W. Increasing transparency through a multiverse analysis. Persp. Psychol. Sci. 11, 702–712 (2016).

    Article  Google Scholar 

  103. Simonsohn, U., Simmons, J. P. & Nelson, L. D. Specification curve analysis. Nat. Hum. Behav. 4, 1208–1214 (2020).

    Article  Google Scholar 

  104. Szaszi, B. et al. Does alleviating poverty increase cognitive performance? Short- and long-term evidence from a randomized controlled trial. Cortex 169, 81–94 (2023).

    Article  Google Scholar 

  105. Botvinik-Nezer, R. et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature 582, 84–88 (2020).

    Article  Google Scholar 

  106. Hoogeveen, S. et al. A many-analysts approach to the relation between religiosity and well-being. Relig. Brain Behav. 13, 237–283 (2022).

    Article  Google Scholar 

  107. Wagenmakers, E.-J., Sarafoglou, A. & Aczel, B. One statistical analysis must not rule them all. Nature 605, 423–425 (2022).

    Article  Google Scholar 

  108. Trafimow, D. A new way to think about internal and external validity. Persp. Psychol. Sci. 18, 1028–1046 (2023).

    Article  Google Scholar 

  109. Trafimow, D. Generalizing across auxiliary, statistical, and inferential assumptions. J. Theory Soc. Behav. 52, 37–48 (2022).

    Article  Google Scholar 

  110. Michie, S. & Johnston, M. Theories and techniques of behaviour change: developing a cumulative science of behaviour change. Health Psychol. Rev. 6, 1–6 (2012).

    Article  Google Scholar 

  111. IJzerman, H. et al. Use caution when applying behavioural science to policy. Nat. Hum. Behav. 4, 1092–1094 (2020).

    Article  Google Scholar 

  112. Davis, R., Campbell, R., Hildon, Z., Hobbs, L. & Michie, S. Theories of behaviour and behaviour change across the social and behavioural sciences: a scoping review. Health Psychol. Rev. 9, 323–344 (2015).

    Article  Google Scholar 

  113. Imai, K., Keele, L., Tingley, D. & Yamamoto, T. Unpacking the black box of causality: learning about causal mechanisms from experimental and observational studies. Am. Polit. Sci. Rev. 105, 765–789 (2011).

    Article  Google Scholar 

  114. Weller, N. & Barnes, J. Finding Pathways: Mixed-Method Research for Studying Causal Mechanisms (Cambridge Univ. Press, 2014).

  115. Goertz, G. Multimethod Research, Causal Mechanisms, and Case Studies: An Integrated Approach (Princeton Univ. Press, 2017).

  116. Sartori, G. Concept misformation in comparative politics. Am. Polit. Sci. Rev. 64, 1033–1053 (1970).

    Article  Google Scholar 

  117. Martel Garcia, F. & Wantchekon, L. Theory, external validity, and experimental inference: some conjectures. Ann. Am. Acad. Pol. Soc. Sci. 628, 132–147 (2010).

    Article  Google Scholar 

  118. Pearl, J. & Mackenzie, D. The Book of Why: The New Science of Cause and Effect (Basic Books, 2018).

  119. Dinner, I., Johnson, E. J., Goldstein, D. G. & Liu, K. Partitioning default effects: why people choose not to choose. J. Exp. Psychol. Appl. 17, 332–341 (2011).

    Google Scholar 

  120. Hajdu, N., Szaszi, B. & Aczel, B. Extending the choice architecture toolbox: the choice context mapping. Sage Open https://doi.org/10.1177/21582440231216831 (2024).

  121. Gyani, A., Tan, C. & Tindall, K. Explore: four simple ways to map and unpack behaviour. Behavioral Insights Team (BIT) https://www.bi.team/publications/explore/ (2022).

  122. Cronbach, L. J. & Shapiro, K. Designing Evaluations of Educational and Social Programs (Jossey-Bass, 1982).

  123. Holzmeister, F. et al. Heterogeneity in effect size estimates. Proc. Natl Acad. Sci. 121, e2403490121 (2024). The study shows how variations in population sampling, study design and analytical approaches create substantial heterogeneity that significantly reduces the generalizability of scientific findings.

    Article  Google Scholar 

  124. Rosenbaum, P. R. & Rubin, D. B. Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79, 516–524 (1984).

    Article  Google Scholar 

  125. Tipton, E. Improving generalizations from experiments using propensity score subclassification: assumptions, properties, and contexts. J. Educ. Behav. Stat. 38, 239–266 (2013).

    Article  Google Scholar 

  126. Sedgwick, P. Convenience sampling. BMJ 347, f6304 (2013).

    Article  Google Scholar 

  127. Lohr, S. L. Sampling: Design and Analysis (Chapman and Hall/CRC, 2021).

  128. Campbell, S. et al. Purposive sampling: complex or simple? Research case examples. J. Res. Nurs. 25, 652–661 (2020).

    Article  Google Scholar 

  129. Punch, K. Developing Effective Research Proposals (Sage, 2000).

  130. Yeager, D. S. et al. A national experiment reveals where a growth mindset improves achievement. Nature 573, 364–369 (2019).

    Article  Google Scholar 

  131. Czibor, E., Jimenez‐Gomez, D. & List, J. A. The dozen things experimental economists should do (more of). South. Econ. J. 86, 371–432 (2019).

    Article  Google Scholar 

  132. Vlasceanu, M. et al. Addressing climate change with behavioral science: a global intervention tournament in 63 countries. Sci. Adv. 10, eadj5778 (2024).

    Article  Google Scholar 

  133. Većkalov, B. et al. A 27-country test of communicating the scientific consensus on climate change. Nat. Hum. Behav. 8, 1892–1905 (2023).

    Article  Google Scholar 

  134. West, R. et al. Using machine learning to extract information and predict outcomes from reports of randomised trials of smoking cessation interventions in the human behaviour-change project. Wellcome Open Res. 8, 452 (2024).

    Article  Google Scholar 

  135. Baribault, B. et al. Metastudies for robust tests of theory. Proc. Natl Acad. Sci. 115, 2607–2612 (2018).

    Article  Google Scholar 

  136. Bahník, Š. & Vranka, M. A. If it’s difficult to pronounce, it might not be risky: the effect of fluency on judgment of risk does not generalize to new stimuli. Psychol. Sci. 28, 427–436 (2017).

    Article  Google Scholar 

  137. Awad, E. et al. The moral machine experiment. Nature 563, 59–64 (2018).

    Article  Google Scholar 

  138. Michie, S. et al. Representation of behaviour change interventions and their evaluation: development of the upper level of the behaviour change intervention ontology. Wellcome Open Res. 5, 123 (2021).

    Article  Google Scholar 

  139. Menkveld, A. J. et al. Nonstandard errors. J. Finance 79, 2339–2390 (2024).

    Article  Google Scholar 

  140. Silberzahn, R. et al. Many analysts, one data set: making transparent how variations in analytic choices affect results. Adv. Meth. Pract. Psychol. Sci. 1, 337–356 (2018). A landmark study that shows how variations in analytical approaches applied to the same dataset can dramatically alter conclusions.

    Article  Google Scholar 

  141. Simons, D. J., Shoda, Y. & Lindsay, D. S. Constraints on generality (COG): a proposed addition to all empirical papers. Persp. Psychol. Sci. 12, 1123–1128 (2017).

    Article  Google Scholar 

  142. Maier, M. et al. Exploring open science practices in behavioural public policy research. R. Soc. Open Sci. 11, 231486 (2023).

    Article  Google Scholar 

  143. Wang, K. et al. A multi-country test of brief reappraisal interventions on emotions during the COVID-19 pandemic. Nat. Hum. Behav. 5, 1089–1110 (2021).

    Article  Google Scholar 

  144. Cologna, V. et al. Trust in scientists and their role in society: a global assessment in 66 countries. Nat. Hum. Behav. 9, 713–730 (2025).

    Article  Google Scholar 

  145. Dai, H. et al. Behavioural nudges increase COVID-19 vaccinations. Nature 597, 404–409 (2021).

    Article  Google Scholar 

  146. Rabb, N. et al. Evidence from a statewide vaccination RCT shows the limits of nudges. Nature 604, E1–E7 (2022).

    Article  Google Scholar 

  147. Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale experiments and machine learning to discover theories of human decision-making. Science 372, 1209–1214 (2021).

    Article  Google Scholar 

  148. Milkman, K. L., Beshears, J., Choi, J. J., Laibson, D. & Madrian, B. C. Using implementation intentions prompts to enhance influenza vaccination rates. Proc. Natl Acad. Sci. 108, 10415–10420 (2011).

    Article  Google Scholar 

  149. Mischel, W. The toothbrush problem. APS Observations https://www.psychologicalscience.org/observer/the-toothbrush-problem (1 December 2008).

  150. Frith, U. Fast lane to slow science. Trends Cogn. Sci. 24, 1–2 (2020).

    Article  Google Scholar 

  151. Grossmann, I. et al. AI and the transformation of social science research. Science 380, 1108–1109 (2023).

    Article  Google Scholar 

  152. Hämäläinen, P., Tavast, M. & Kunnari, A. Evaluating large language models in generating synthetic HCI research data: a case study. In Proc. Conf. Human Factors in Computing Systems 433 (CHI, 2023).

  153. Batista, R. M. & Ross, J. Words that work: using language to generate hypotheses. Preprint at SSRN https://papers.ssrn.com/sol3/Delivery.cfm?abstractid=4926398 (2024).

  154. Manning, B. S., Zhu, K. & Horton, J. J. Automated social science: language models as scientist and subjects. Preprint at arXiv https://doi.org/10.48550/arXiv.2404.11794 (2024).

  155. Capraro, V., Paolo, R. D. & Pizziol, V. Assessing large language models’ ability to predict how humans balance self-interest and the interest of others. Preprint at arXiv https://doi.org/10.48550/arXiv.2307.12776 (2024).

  156. Arzaghi, M., Carichon, F. & Farnadi, G. Understanding intrinsic socioeconomic biases in large language models. In Proc. Conf. AI, Ethics and Society 49–60 (AAAI/ACM, 2024).

  157. Atari, M., Xue, M. J., Park, P. S., Blasi, D. & Henrich, J. Which humans? Preprint at PsyArXiv https://doi.org/10.31234/osf.io/5b26t (2023).

  158. Peng, C. et al. A study of generative large language model for medical research and healthcare. npj Digit. Med. 6, 210 (2023).

    Article  Google Scholar 

  159. Capraro, V. et al. The impact of generative artificial intelligence on socioeconomic inequalities and policy making. PNAS Nexus 3, 191 (2024).

    Article  Google Scholar 

  160. Michie, S. et al. The human behaviour-change project: harnessing the power of artificial intelligence and machine learning for evidence synthesis and interpretation. Implem. Sci. 12, 121 (2017).

    Article  Google Scholar 

  161. Michie, S. et al. The human behaviour-change project: an artificial intelligence system to answer questions about changing behaviour. Wellcome Open Res. 5, 122 (2020).

    Article  Google Scholar 

  162. Linnea, G. Nudge cartography: building a map to navigate behavioral research. Penn Today https://penntoday.upenn.edu/news/nudge-cartography-building-map-navigate-behavioral-research (2 August 2023).

  163. Kim, J., Yoon, Y., Choi, J., Dong, H. & Soman, D. Surprising consequences of innocuous mobile transaction reminders of credit card use. J. Interact. Market. 59, 135–150 (2023).

    Article  Google Scholar 

  164. Gee, L. K. & Schreck, M. J. Do beliefs about peers matter for donation matching? Experiments in the field and laboratory. Games Econ. Behav. 107, 282–297 (2018).

    Article  Google Scholar 

  165. West, R. & Michie, S. A brief introduction to the COM-B model of behaviour and the PRIME theory of motivation. Preprint at Qeios https://doi.org/10.32388/WW04E6 (2020).

  166. Bird, K. A. et al. Nudging at scale: experimental evidence from FAFSA completion campaigns. J. Econ. Behav. Organ. 183, 105–128 (2021).

    Article  Google Scholar 

  167. Bettinger, E. P., Long, B. T., Oreopoulos, P. & Sanbonmatsu, L. The role of application assistance and information in college decisions: results from the H&R Block FAFSA experiment. Q. J. Econ. 127, 1205–1242 (2012).

    Article  Google Scholar 

  168. Rai, A., Sharif, M. A., Chang, E. H., Milkman, K. L. & Duckworth, A. L. A field experiment on subgoal framing to boost volunteering: the trade-off between goal granularity and flexibility. J. Appl. Psychol. 108, 621 (2023).

    Article  Google Scholar 

  169. List, J. A. The Voltage Effect: How to Make Good Ideas Great and Great Ideas Scale (Currency, 2022).

  170. Van Rookhuijzen, M., De Vet, E. & Adriaanse, M. A. The effects of nudges: one-shot only? Exploring the temporal spillover effects of a default nudge. Front. Psychol. 12, 683262 (2021).

    Article  Google Scholar 

  171. Dang, J., King, K. M. & Inzlicht, M. Why are self-report and behavioral measures weakly correlated? Trends Cogn. Sci. 24, 267–269 (2020).

    Article  Google Scholar 

  172. Oehlmann, M., Meyerhoff, J., Mariel, P. & Weller, P. Uncovering context-induced status quo effects in choice experiments. J. Environ. Econ. Manag. 81, 59–73 (2017).

    Article  Google Scholar 

  173. Sunstein, C. R. Nudges that fail. Behav. Public Policy 1, 4–25 (2017).

    Article  Google Scholar 

  174. Osman, M., Fenton, N., Pilditch, T., Lagnado, D. & Neil, M. Whom do we trust on social policy interventions? Basic Appl. Soc. Psychol. 40, 249–268 (2018).

    Article  Google Scholar 

  175. Arad, A. & Rubinstein, A. The people’s perspective on libertarian-paternalistic policies. J. Law Econ. 61, 311–333 (2018).

    Article  Google Scholar 

  176. Paunov, Y., Wänke, M. & Vogel, T. Ethical defaults: which transparency components can increase the effectiveness of default nudges? Soc. Influ. 14, 104–116 (2019).

    Article  Google Scholar 

  177. Milkman, K. L. et al. A megastudy of text-based nudges encouraging patients to get vaccinated at an upcoming doctor’s appointment. Proc. Natl Acad. Sci. 118, e2101165118 (2021).

    Article  Google Scholar 

  178. Milkman, K. L. et al. A 680,000-person megastudy of nudges to encourage vaccination in pharmacies. Proc. Natl Acad. Sci. 119, e2115126119 (2022).

    Article  Google Scholar 

Download references

Acknowledgements

ChatGPT was used to check grammar and improve the language of the manuscript. B.S. was supported by the Eötvös Loránd University Excellence Fund (EKA). B.S. thanks M. Szrenka for her patience.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed substantially to discussion of the content. B.S. wrote the first draft of the manuscript; D.G.G., D.S. and S.M. reviewed, edited and provided critical revisions.

Corresponding author

Correspondence to Barnabas Szaszi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Psychology thanks Valerio Capraro and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Szaszi, B., Goldstein, D.G., Soman, D. et al. Generalizability of choice architecture interventions. Nat Rev Psychol 4, 518–529 (2025). https://doi.org/10.1038/s44159-025-00471-9

Download citation

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s44159-025-00471-9

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing