Abstract
Although a given choice architecture intervention (a ‘nudge’) can be highly effective in some conditions, it can be ineffective or counterproductive in others. Critically, researchers and practitioners cannot reliably predict which of these outcomes will happen on the basis of current knowledge. In this Review, we present evidence that the average effectiveness of choice architecture interventions on behaviour is smaller than often reported and that there is substantial heterogeneity in their effects. We outline the obstacles to understanding the generalizability of these effects, such as the complex interaction of moderators and their changes over time. We then clarify dimensions of generalizability and research practices (including systematic exploration of the moderators and practices designed to enhance generalizability) that could enable evidence on generalizability to be gathered more efficiently. These practices are essential for advancing nuanced theories of behaviour change and for more accurately predicting the effectiveness of choice architecture interventions across diverse populations, settings, treatments, outputs and analytical approaches.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 digital issues and online access to articles
$59.00 per year
only $4.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout


Similar content being viewed by others
References
Thaler, R. H. & Sunstein, C. R. Nudge: Improving Decisions About Health, Wealth, and Happiness (Yale Univ. Press, 2008).
Michie, S., Van Stralen, M. M. & West, R. The behaviour change wheel: a new method for characterising and designing behaviour change interventions. Implem. Sci. 6, 42 (2011).
Halpern, D. Inside the Nudge Unit: How Small Changes Can Make a Big Difference (Random House, 2015).
Benartzi, S. et al. Should governments invest more in nudging? Psychol. Sci. 28, 1041–1055 (2017).
Fishbane, A., Ouss, A. & Shah, A. K. Behavioral nudges reduce failure to appear for court. Science 370, eabb6591 (2020).
Allcott, H. & Rogers, T. The short-run and long-run effects of behavioral interventions: experimental evidence from energy conservation. Am. Econ. Rev. 104, 3003–3037 (2014).
Allcott, H. Site selection bias in program evaluation. Q. J. Econ. 130, 1117–1165 (2015).
Hallsworth, M., List, J. A., Metcalfe, R. D. & Vlaev, I. The behavioralist as tax collector: using natural field experiments to enhance tax compliance. J. Public. Econ. 148, 14–31 (2017).
Frost, P. et al. The influence of confirmation bias on memory and source monitoring. J. Gen. Psychol. 142, 238–252 (2015).
Sterling, T. D. Publication decisions and their possible effects on inferences drawn from tests of significance — or vice versa. J. Am. Stat. Assoc. 54, 30–34 (1959).
Jachimowicz, J. M., Hauser, O. P., O’Brien, J. D., Sherman, E. & Galinsky, A. D. The critical role of second-order normative beliefs in predicting energy conservation. Nat. Hum. Behav. 2, 757–764 (2018).
Hallsworth, M. A manifesto for applying behavioural science. Nat. Hum. Behav. 7, 310–322 (2023).
Straßheim, H. The rise and spread of behavioral public policy: an opportunity for critical research and self-reflection. Int. Rev. Public Policy 2, 115–128 (2020).
Cartwright, N. & Hardie, J. Evidence-Based Policy: A Practical Guide to Doing It Better (Oxford Univ. Press, 2012).
Yarkoni, T. The generalizability crisis. Behav. Brain Sci. 45, 102723 (2022).
Szaszi, B., Palinkas, A., Palfi, B., Szollosi, A. & Aczel, B. A systematic scoping review of the choice architecture movement: toward understanding when and why nudges work. J. Behav. Decis. Mak. 31, 355–366 (2018). A systematic review that examines the choice architecture movement, summarizing how to improve the evidence synthesis of nudge interventions.
Bryan, C. J., Tipton, E. & Yeager, D. S. Behavioural science is unlikely to change the world without a heterogeneity revolution. Nat. Hum. Behav. 5, 980–989 (2021). A critical review that argues that behavioural science can create meaningful change only by rigorously addressing heterogeneity in research findings.
Osman, M. et al. Learning from behavioural changes that fail. Trends Cogn. Sci. 24, 969–980 (2020).
Milkman, K. L. et al. Megastudies improve the impact of applied behavioural science. Nature 600, 478–483 (2021).
Cartwright, N. Middle-range theory: without it what could anyone do? Theor. Int. J. Theory Hist. Found. Sci. 35, 269–323 (2020).
Findley, M. G., Kikuta, K. & Denly, M. External validity. Annu. Rev. Polit. Sci. 24, 365–393 (2021). A review of external validity in research that provides a framework for understanding how research findings can be applied beyond specific study contexts.
Vazire, S., Schiavone, S. R. & Bottesini, J. G. Credibility beyond replicability: improving the four validities in psychological science. Curr. Dir. Psychol. Sci. 31, 162–168 (2022).
Lesko, C. R., Ackerman, B., Webster-Clark, M. & Edwards, J. K. Target validity: bringing treatment of external validity in line with internal validity. Curr. Epidemiol. Rep. 7, 117–124 (2020).
Marcellesi, A. External validity: is there still a problem? Phil. Sci. 82, 1308–1317 (2015).
Bareinboim, E. & Pearl, J. A general algorithm for deciding transportability of experimental results. J. Causal Infer. 1, 107–134 (2013).
Campbell, D. T. Factors relevant to the validity of experiments in social settings. Psychol. Bull. 54, 297–312 (1957).
Almaatouq, A. et al. Beyond playing 20 questions with nature: integrative experiment design in the social and behavioral sciences. Behav. Brain Sci. 47, e33 (2024). A review study that proposes a new, more integrative experimental design in social and behavioural sciences, moving beyond simplistic hypothesis testing.
Moeller, J. et al. Generalizability crisis meets heterogeneity revolution: determining under which boundary conditions findings replicate and generalize. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/5wsna (2022).
Schloss, P. D. Identifying and overcoming threats to reproducibility, replicability, robustness, and generalizability in microbiome research. mBio 9, 00525-18 (2018).
Whitaker, K. Publishing a reproducible paper. Open Science in Practice Summer School https://figshare.com/articles/presentation/Publishing_a_reproducible_paper/5440621?file=9408697 (2017).
Pearl, J. & Bareinboim, E. in Probabilistic and Causal Inference: The Works of Judea Pearl (eds Geffner, H. et al.) 451–482 (Association for Computing Machinery, 2022).
Hummel, D. & Maedche, A. How effective is nudging? A quantitative review on the effect sizes and limits of empirical nudging studies. J. Behav. Exp. Econ. 80, 47–58 (2019).
DellaVigna, S. & Linos, E. RCTs to scale: comprehensive evidence from two nudge units. Econometrica 90, 81–116 (2022). A comprehensive analysis of randomized controlled trials from two nudge units that provides large-scale evidence about intervention effectiveness.
Mertens, S., Herberz, M., Hahnel, U. J. & Brosch, T. The effectiveness of nudging: a meta-analysis of choice architecture interventions across behavioral domains. Proc. Natl Acad. Sci. 119, e2107346118 (2022).
Maier, M. et al. No evidence for nudging after adjusting for publication bias. Proc. Natl Acad. Sci. 119, e2200300119 (2022).
Szaszi, B. et al. No reason to expect large and consistent effects of nudge interventions. Proc. Natl Acad. Sci. 119, e2200732119 (2022).
Meehl, P. E. Theory-testing in psychology and physics: a methodological paradox. Phil. Sci. 34, 103–115 (1967).
Gelman, A. Causality and statistical learning. Am. J. Sociol. 117, 955–966 (2011).
Tosh, C., Greengard, P., Goodrich, B., Gelman, A. & Hsu, D. The piranha problem: large effects swimming in a small pond. Preprint at arXiv https://doi.org/10.48550/arXiv.2105.13445 (2024).
Bartoš, F. et al. Meta-analyses in psychology often overestimate evidence for and size of effects. R. Soc. Open. Sci. 10, 230224 (2023).
Borenstein, M., Hedges, L. V., Higgins, J. P. & Rothstein, H. R. Introduction to Meta-Analysis (John Wiley & Sons, 2011).
Rosenthal, R. & Gaito, J. Further evidence for the cliff effect in interpretation of levels of significance. Psychol. Rep. 15, 570 (1964).
Simonsohn, U., Simmons, J. & Nelson, L. Meaningless means #2: the average effect of nudging in academic publications is 8.7%. DataColada http://www.datacolada.ort/106 (2022).
Mertens, S., Herberz, M., Hahnel, U. J. J. & Brosch, T. Reply to Maier et al., Szaszi et al., and Bakdash and Marusich: the present and future of choice architecture research. Proc. Natl Acad. Sci. 119, e2202928119 (2022).
Bakdash, J. Z. & Marusich, L. R. Left-truncated effects and overestimated meta-analytic means. Proc. Natl Acad. Sci. 119, e2203616119 (2022).
Banerjee, A. & Urminsky, O. The language that drives engagement: a systematic large-scale analysis of headline experiments. Mark. Sci. 44, 491–732 (2024).
Simonsohn, U. ‘The’ effect size does not exist. DataColada https://datacolada.org/33 (2015).
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).
Parsons, S. et al. A community-sourced glossary of open scholarship terms. Nat. Hum. Behav. 6, 312–318 (2022).
Nosek, B. A., Ebersole, C. R., DeHaven, A. C. & Mellor, D. T. The preregistration revolution. Proc. Natl Acad. Sci. 115, 2600–2606 (2018).
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. & Kievit, R. A. An agenda for purely confirmatory research. Persp. Psychol. Sci. 7, 632–638 (2012).
Schäfer, T. & Schwarz, M. The meaningfulness of effect sizes in psychological research: differences between sub-disciplines and the impact of potential biases. Front. Psychol. 10, 813 (2019).
Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).
Camerer, C. F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat. Hum. Behav. 2, 637–644 (2018).
Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016).
Errington, T. M. et al. Investigating the replicability of preclinical cancer biology. eLife 10, e71601 (2021).
Shu, L. L., Mazar, N., Gino, F., Ariely, D. & Bazerman, M. H. Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end. Proc. Natl Acad. Sci. 109, 15197–15200 (2012).
Stricker, J. & Günther, A. Scientific misconduct in psychology. Z. Psychol. 227, 53–63 (2019).
Simmons, J., Nelson, L. & Simonsohn, U. Meaningless means #1: the average effect of nudging is d = 0.43. DataColada https://datacolada.org/105 (2022).
Simonsohn, U., Simmons, J. & Nelson, L. D. Above averaging in literature reviews. Nat. Rev. Psychol. 1, 551–552 (2022).
Atkins, L. et al. A guide to using the Theoretical Domains Framework of behaviour change to investigate implementation problems. Implem. Sci. 12, 77 (2017).
Yang, S. et al. The elements of context. Rotman School of Management University of Toronto https://www.rotman.utoronto.ca/media/rotman/content-assets/images/areas/bear/white-papers-pdf-printable/The-Elements-of-Context---BEAR-Research-Report-Series---20July2023-(1).pdf (2023).
Mažar, N. & Soman, D. Behavioral Science in the Wild (Univ. Toronto Press, 2022).
Ning, Z., Xin, L. I. U., Shu, L. I. & Rui, Z. Nudging effect of default options: a meta-analysis. Adv. Psychol. Sci. 30, 1230 (2022).
Johnson, E. J. & Goldstein, D. G. Do defaults save lives? Science 302, 1338–1339 (2003).
Krefeld-Schwalb, A., Sugerman, E. R. & Johnson, E. J. Exposing omitted moderators: explaining why effect sizes differ in the social sciences. Proc. Natl Acad. Sci. 121, e2306281121 (2024).
Jachimowicz, J. M., Duncan, S., Weber, E. U. & Johnson, E. J. When and why defaults influence decisions: a meta-analysis of default effects. Behav. Public Policy 3, 159–186 (2019).
Narula, T., Ramprasad, C., Ruggs, E. N. & Hebl, M. R. Increasing colonoscopies? A psychological perspective on opting in versus opting out. Health Psychol. 33, 1426 (2014).
Klein, R. A. et al. Many Labs 2: investigating variation in replicability across samples and settings. Adv. Meth. Pract. Psychol. Sci. 1, 443–490 (2018).
Moshontz, H. et al. The psychological science accelerator: advancing psychology through a distributed collaborative network. Adv. Meth. Pract. Psychol. Sci. 1, 501–515 (2018).
Van Bavel, J. J., Mende-Siedlecki, P., Brady, W. J. & Reinero, D. A. Contextual sensitivity in scientific reproducibility. Proc. Natl Acad. Sci. 113, 6454–6459 (2016).
Vonasch, A. J. et al. “Less Is Better” in separate evaluations versus “More Is Better” in joint evaluations: mostly successful close replication and extension of Hsee (1998). Coll. Psychol. 9, 77859 (2023).
Imada, H. et al. Rewarding more is better for soliciting help, yet more so for cash than for goods: revisiting and reframing the tale of two markets with replications and extensions of Heyman and Ariely (2004). Coll. Psychol. 8, 32572 (2022).
Ziano, I. et al. Numbing or sensitization? replications and extensions of Fetherstonhaugh et al. (1997)’s “Insensitivity to the Value of Human Life”. J. Exp. Soc. Psychol. 97, 104222 (2021).
Feldman, G. Collaborative Open-science and meta REsearch (CORE) Team. OSFHOME https://doi.org/10.17605/OSF.IO/5Z4A8 (2025).
Hall, J. D. & Madsen, J. M. Can behavioral interventions be too salient? Evidence from traffic safety messages. Science 376, eabm3427 (2022).
Singer, P. Animal Liberation (Harper Collins, 1975).
Orth, T. The ethics of eating animals: which factors influence Americans’ views? YouGov https://today.yougov.com/health/articles/45577-ethics-eating-animals-which-factors-matter-poll (2023).
Delios, A. et al. Examining the generalizability of research findings from archival data. Proc. Natl Acad. Sci. 119, e2120377119 (2022).
Tipton, E. How generalizable is your experiment? An index for comparing experimental samples and populations. J. Educ. Behav. Stat. 39, 478–501 (2014).
Camerer, C. The promise and success of lab-field generalizability in experimental economics: a critical reply to Levitt and List. Preprint at SSRN https://doi.org/10.2139/ssrn.1977749 (2011).
Pearl, J. Generalizing experimental findings. J. Causal Infer. 3, 259–266 (2015).
Henrich, J., Heine, S. J. & Norenzayan, A. Beyond WEIRD: towards a broad-based behavioral science. Behav. Brain Sci. 33, 111–135 (2010).
Roberts, S. O., Bareket-Shavit, C., Dollins, F. A., Goldie, P. D. & Mortenson, E. Racial inequality in psychological research: trends of the past and recommendations for the future. Persp. Psychol. Sci. 15, 1295–1309 (2020).
Mortensen, K. & Hughes, T. L. Comparing Amazon’s Mechanical Turk platform to conventional data collection methods in the health and medical research literature. J. Gen. Intern. Med. 33, 533–538 (2018).
Yeager, D. S., Krosnick, J. A., Visser, P. S., Holbrook, A. L. & Tahk, A. M. Moderation of classic social psychological effects by demographics in the US adult population: new opportunities for theoretical advancement. J. Pers. Soc. Psychol. 117, e84 (2019).
Landy, J. F. et al. Crowdsourcing hypothesis tests: making transparent how design choices shape research results. Psychol. Bull. 146, 451–479 (2020). This methodological study demonstrates how research design choices fundamentally shape research results.
Huber, C. et al. Competition and moral behavior: a meta-analysis of forty-five crowd-sourced experimental designs. Proc. Natl Acad. Sci. 120, e2215572120 (2023).
Clark, H. H. The language-as-fixed-effect fallacy: a critique of language statistics in psychological research. J. Verbal Learn. Verbal Behav. 12, 335–359 (1973).
Wells, G. L. & Windschitl, P. D. Stimulus sampling and social psychological experimentation. Pers. Soc. Psychol. Bull. 25, 1115–1125 (1999).
Judd, C. M., Westfall, J. & Kenny, D. A. Treating stimuli as a random factor in social psychology: a new and comprehensive solution to a pervasive but largely ignored problem. J. Pers. Soc. Psychol. 103, 54–69 (2012).
Capraro, V., Di Paolo, R., Perc, M. & Pizziol, V. Language-based game theory in the age of artificial intelligence. J. R. Soc. Interf. 21, 20230720 (2024).
Capraro, V., Halpern, J. Y. & Perc, M. From outcome-based to language-based preferences. J. Econ. Lit. 62, 115–154 (2024).
Goldstein, D. G. in The Behavioral Economics Guide 2022 (ed. Samson, A.) 6–11 (Behavioraleconomics.com, 2022).
Lejarraga, T. & Hertwig, R. How experimental methods shaped views on human competence and rationality. Psychol. Bull. 147, 535–564 (2021).
Doerrenberg, P. & Schmitz, J. Tax compliance and information provision: a field experiment with small firms. ECONSTOR https://www.econstor.eu/handle/10419/110751 (2015).
Carey, R. N. et al. Describing the ‘how’ of behaviour change interventions: a taxonomy of modes of delivery. Eur. Health. Psychol. https://ehps.net/ehp/index.php/contents/article/view/2687 (2017).
Webb, T. L. & Sheeran, P. Does changing behavioral intentions engender behavior change? A meta-analysis of the experimental evidence. Psychol. Bull. 132, 249 (2006).
Osman, M. Backfiring, reactance, boomerang, spillovers, and rebound effects: can we learn anything from examples where nudges do the opposite of what they intended? Preprint at PsyArXiv https://doi.org/10.31234/osf.io/ae756 (2020).
Carpena, F., Cole, S., Shapiro, J. & Zia, B. The ABCs of financial education: experimental evidence on attitudes, behavior, and cognitive biases. Manag. Sci. 65, 346–369 (2019).
Aczel, B. et al. Consensus-based guidance for conducting and reporting multi-analyst studies. eLife 10, e72185 (2021).
Steegen, S., Tuerlinckx, F., Gelman, A. & Vanpaemel, W. Increasing transparency through a multiverse analysis. Persp. Psychol. Sci. 11, 702–712 (2016).
Simonsohn, U., Simmons, J. P. & Nelson, L. D. Specification curve analysis. Nat. Hum. Behav. 4, 1208–1214 (2020).
Szaszi, B. et al. Does alleviating poverty increase cognitive performance? Short- and long-term evidence from a randomized controlled trial. Cortex 169, 81–94 (2023).
Botvinik-Nezer, R. et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature 582, 84–88 (2020).
Hoogeveen, S. et al. A many-analysts approach to the relation between religiosity and well-being. Relig. Brain Behav. 13, 237–283 (2022).
Wagenmakers, E.-J., Sarafoglou, A. & Aczel, B. One statistical analysis must not rule them all. Nature 605, 423–425 (2022).
Trafimow, D. A new way to think about internal and external validity. Persp. Psychol. Sci. 18, 1028–1046 (2023).
Trafimow, D. Generalizing across auxiliary, statistical, and inferential assumptions. J. Theory Soc. Behav. 52, 37–48 (2022).
Michie, S. & Johnston, M. Theories and techniques of behaviour change: developing a cumulative science of behaviour change. Health Psychol. Rev. 6, 1–6 (2012).
IJzerman, H. et al. Use caution when applying behavioural science to policy. Nat. Hum. Behav. 4, 1092–1094 (2020).
Davis, R., Campbell, R., Hildon, Z., Hobbs, L. & Michie, S. Theories of behaviour and behaviour change across the social and behavioural sciences: a scoping review. Health Psychol. Rev. 9, 323–344 (2015).
Imai, K., Keele, L., Tingley, D. & Yamamoto, T. Unpacking the black box of causality: learning about causal mechanisms from experimental and observational studies. Am. Polit. Sci. Rev. 105, 765–789 (2011).
Weller, N. & Barnes, J. Finding Pathways: Mixed-Method Research for Studying Causal Mechanisms (Cambridge Univ. Press, 2014).
Goertz, G. Multimethod Research, Causal Mechanisms, and Case Studies: An Integrated Approach (Princeton Univ. Press, 2017).
Sartori, G. Concept misformation in comparative politics. Am. Polit. Sci. Rev. 64, 1033–1053 (1970).
Martel Garcia, F. & Wantchekon, L. Theory, external validity, and experimental inference: some conjectures. Ann. Am. Acad. Pol. Soc. Sci. 628, 132–147 (2010).
Pearl, J. & Mackenzie, D. The Book of Why: The New Science of Cause and Effect (Basic Books, 2018).
Dinner, I., Johnson, E. J., Goldstein, D. G. & Liu, K. Partitioning default effects: why people choose not to choose. J. Exp. Psychol. Appl. 17, 332–341 (2011).
Hajdu, N., Szaszi, B. & Aczel, B. Extending the choice architecture toolbox: the choice context mapping. Sage Open https://doi.org/10.1177/21582440231216831 (2024).
Gyani, A., Tan, C. & Tindall, K. Explore: four simple ways to map and unpack behaviour. Behavioral Insights Team (BIT) https://www.bi.team/publications/explore/ (2022).
Cronbach, L. J. & Shapiro, K. Designing Evaluations of Educational and Social Programs (Jossey-Bass, 1982).
Holzmeister, F. et al. Heterogeneity in effect size estimates. Proc. Natl Acad. Sci. 121, e2403490121 (2024). The study shows how variations in population sampling, study design and analytical approaches create substantial heterogeneity that significantly reduces the generalizability of scientific findings.
Rosenbaum, P. R. & Rubin, D. B. Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79, 516–524 (1984).
Tipton, E. Improving generalizations from experiments using propensity score subclassification: assumptions, properties, and contexts. J. Educ. Behav. Stat. 38, 239–266 (2013).
Sedgwick, P. Convenience sampling. BMJ 347, f6304 (2013).
Lohr, S. L. Sampling: Design and Analysis (Chapman and Hall/CRC, 2021).
Campbell, S. et al. Purposive sampling: complex or simple? Research case examples. J. Res. Nurs. 25, 652–661 (2020).
Punch, K. Developing Effective Research Proposals (Sage, 2000).
Yeager, D. S. et al. A national experiment reveals where a growth mindset improves achievement. Nature 573, 364–369 (2019).
Czibor, E., Jimenez‐Gomez, D. & List, J. A. The dozen things experimental economists should do (more of). South. Econ. J. 86, 371–432 (2019).
Vlasceanu, M. et al. Addressing climate change with behavioral science: a global intervention tournament in 63 countries. Sci. Adv. 10, eadj5778 (2024).
Većkalov, B. et al. A 27-country test of communicating the scientific consensus on climate change. Nat. Hum. Behav. 8, 1892–1905 (2023).
West, R. et al. Using machine learning to extract information and predict outcomes from reports of randomised trials of smoking cessation interventions in the human behaviour-change project. Wellcome Open Res. 8, 452 (2024).
Baribault, B. et al. Metastudies for robust tests of theory. Proc. Natl Acad. Sci. 115, 2607–2612 (2018).
Bahník, Š. & Vranka, M. A. If it’s difficult to pronounce, it might not be risky: the effect of fluency on judgment of risk does not generalize to new stimuli. Psychol. Sci. 28, 427–436 (2017).
Awad, E. et al. The moral machine experiment. Nature 563, 59–64 (2018).
Michie, S. et al. Representation of behaviour change interventions and their evaluation: development of the upper level of the behaviour change intervention ontology. Wellcome Open Res. 5, 123 (2021).
Menkveld, A. J. et al. Nonstandard errors. J. Finance 79, 2339–2390 (2024).
Silberzahn, R. et al. Many analysts, one data set: making transparent how variations in analytic choices affect results. Adv. Meth. Pract. Psychol. Sci. 1, 337–356 (2018). A landmark study that shows how variations in analytical approaches applied to the same dataset can dramatically alter conclusions.
Simons, D. J., Shoda, Y. & Lindsay, D. S. Constraints on generality (COG): a proposed addition to all empirical papers. Persp. Psychol. Sci. 12, 1123–1128 (2017).
Maier, M. et al. Exploring open science practices in behavioural public policy research. R. Soc. Open Sci. 11, 231486 (2023).
Wang, K. et al. A multi-country test of brief reappraisal interventions on emotions during the COVID-19 pandemic. Nat. Hum. Behav. 5, 1089–1110 (2021).
Cologna, V. et al. Trust in scientists and their role in society: a global assessment in 66 countries. Nat. Hum. Behav. 9, 713–730 (2025).
Dai, H. et al. Behavioural nudges increase COVID-19 vaccinations. Nature 597, 404–409 (2021).
Rabb, N. et al. Evidence from a statewide vaccination RCT shows the limits of nudges. Nature 604, E1–E7 (2022).
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale experiments and machine learning to discover theories of human decision-making. Science 372, 1209–1214 (2021).
Milkman, K. L., Beshears, J., Choi, J. J., Laibson, D. & Madrian, B. C. Using implementation intentions prompts to enhance influenza vaccination rates. Proc. Natl Acad. Sci. 108, 10415–10420 (2011).
Mischel, W. The toothbrush problem. APS Observations https://www.psychologicalscience.org/observer/the-toothbrush-problem (1 December 2008).
Frith, U. Fast lane to slow science. Trends Cogn. Sci. 24, 1–2 (2020).
Grossmann, I. et al. AI and the transformation of social science research. Science 380, 1108–1109 (2023).
Hämäläinen, P., Tavast, M. & Kunnari, A. Evaluating large language models in generating synthetic HCI research data: a case study. In Proc. Conf. Human Factors in Computing Systems 433 (CHI, 2023).
Batista, R. M. & Ross, J. Words that work: using language to generate hypotheses. Preprint at SSRN https://papers.ssrn.com/sol3/Delivery.cfm?abstractid=4926398 (2024).
Manning, B. S., Zhu, K. & Horton, J. J. Automated social science: language models as scientist and subjects. Preprint at arXiv https://doi.org/10.48550/arXiv.2404.11794 (2024).
Capraro, V., Paolo, R. D. & Pizziol, V. Assessing large language models’ ability to predict how humans balance self-interest and the interest of others. Preprint at arXiv https://doi.org/10.48550/arXiv.2307.12776 (2024).
Arzaghi, M., Carichon, F. & Farnadi, G. Understanding intrinsic socioeconomic biases in large language models. In Proc. Conf. AI, Ethics and Society 49–60 (AAAI/ACM, 2024).
Atari, M., Xue, M. J., Park, P. S., Blasi, D. & Henrich, J. Which humans? Preprint at PsyArXiv https://doi.org/10.31234/osf.io/5b26t (2023).
Peng, C. et al. A study of generative large language model for medical research and healthcare. npj Digit. Med. 6, 210 (2023).
Capraro, V. et al. The impact of generative artificial intelligence on socioeconomic inequalities and policy making. PNAS Nexus 3, 191 (2024).
Michie, S. et al. The human behaviour-change project: harnessing the power of artificial intelligence and machine learning for evidence synthesis and interpretation. Implem. Sci. 12, 121 (2017).
Michie, S. et al. The human behaviour-change project: an artificial intelligence system to answer questions about changing behaviour. Wellcome Open Res. 5, 122 (2020).
Linnea, G. Nudge cartography: building a map to navigate behavioral research. Penn Today https://penntoday.upenn.edu/news/nudge-cartography-building-map-navigate-behavioral-research (2 August 2023).
Kim, J., Yoon, Y., Choi, J., Dong, H. & Soman, D. Surprising consequences of innocuous mobile transaction reminders of credit card use. J. Interact. Market. 59, 135–150 (2023).
Gee, L. K. & Schreck, M. J. Do beliefs about peers matter for donation matching? Experiments in the field and laboratory. Games Econ. Behav. 107, 282–297 (2018).
West, R. & Michie, S. A brief introduction to the COM-B model of behaviour and the PRIME theory of motivation. Preprint at Qeios https://doi.org/10.32388/WW04E6 (2020).
Bird, K. A. et al. Nudging at scale: experimental evidence from FAFSA completion campaigns. J. Econ. Behav. Organ. 183, 105–128 (2021).
Bettinger, E. P., Long, B. T., Oreopoulos, P. & Sanbonmatsu, L. The role of application assistance and information in college decisions: results from the H&R Block FAFSA experiment. Q. J. Econ. 127, 1205–1242 (2012).
Rai, A., Sharif, M. A., Chang, E. H., Milkman, K. L. & Duckworth, A. L. A field experiment on subgoal framing to boost volunteering: the trade-off between goal granularity and flexibility. J. Appl. Psychol. 108, 621 (2023).
List, J. A. The Voltage Effect: How to Make Good Ideas Great and Great Ideas Scale (Currency, 2022).
Van Rookhuijzen, M., De Vet, E. & Adriaanse, M. A. The effects of nudges: one-shot only? Exploring the temporal spillover effects of a default nudge. Front. Psychol. 12, 683262 (2021).
Dang, J., King, K. M. & Inzlicht, M. Why are self-report and behavioral measures weakly correlated? Trends Cogn. Sci. 24, 267–269 (2020).
Oehlmann, M., Meyerhoff, J., Mariel, P. & Weller, P. Uncovering context-induced status quo effects in choice experiments. J. Environ. Econ. Manag. 81, 59–73 (2017).
Sunstein, C. R. Nudges that fail. Behav. Public Policy 1, 4–25 (2017).
Osman, M., Fenton, N., Pilditch, T., Lagnado, D. & Neil, M. Whom do we trust on social policy interventions? Basic Appl. Soc. Psychol. 40, 249–268 (2018).
Arad, A. & Rubinstein, A. The people’s perspective on libertarian-paternalistic policies. J. Law Econ. 61, 311–333 (2018).
Paunov, Y., Wänke, M. & Vogel, T. Ethical defaults: which transparency components can increase the effectiveness of default nudges? Soc. Influ. 14, 104–116 (2019).
Milkman, K. L. et al. A megastudy of text-based nudges encouraging patients to get vaccinated at an upcoming doctor’s appointment. Proc. Natl Acad. Sci. 118, e2101165118 (2021).
Milkman, K. L. et al. A 680,000-person megastudy of nudges to encourage vaccination in pharmacies. Proc. Natl Acad. Sci. 119, e2115126119 (2022).
Acknowledgements
ChatGPT was used to check grammar and improve the language of the manuscript. B.S. was supported by the Eötvös Loránd University Excellence Fund (EKA). B.S. thanks M. Szrenka for her patience.
Author information
Authors and Affiliations
Contributions
All authors contributed substantially to discussion of the content. B.S. wrote the first draft of the manuscript; D.G.G., D.S. and S.M. reviewed, edited and provided critical revisions.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Psychology thanks Valerio Capraro and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Szaszi, B., Goldstein, D.G., Soman, D. et al. Generalizability of choice architecture interventions. Nat Rev Psychol 4, 518–529 (2025). https://doi.org/10.1038/s44159-025-00471-9
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s44159-025-00471-9