Abstract
People are remarkably capable of generating their own goals, beginning with child’s play and continuing into adulthood. Despite considerable empirical and computational work on goals and goal-oriented behaviour, models are still far from capturing the richness of everyday human goals. Here we bridge this gap by collecting a dataset of human-generated playful goals (in the form of scorable, single-player games), modelling them as reward-producing programs and generating novel human-like goals through program synthesis. Reward-producing programs capture the rich semantics of goals through symbolic operations that compose, add temporal constraints and allow program execution on behavioural traces to evaluate progress. To build a generative model of goals, we learn a fitness function over the infinite set of possible goal programs and sample novel goals with a quality-diversity algorithm. Human evaluators found that model-generated goals, when sampled from partitions of program space occupied by human examples, were indistinguishable from human-created games. We also discovered that our model’s internal fitness scores predict games that are evaluated as more fun to play and more human-like.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
All data for our study, including raw participant responses in the behavioural experiment, their translations to programs in our DSL and the specification for the DSL, are available via GitHub at https://github.com/guydav/goals-as-reward-producing-programs/ or via Zenodo at https://doi.org/10.5281/zenodo.14238893 (ref. 97).
Code availability
All code for our study, including code used to analyse and generate figures for our behavioural experiment, and the full implementation of our GPG model, are available via GitHub at https://github.com/guydav/goals-as-reward-producing-programs/ or via Zenodo at https://doi.org/10.5281/zenodo.14238893 (ref. 97). Our behavioural data collection experiment is publicly accessible at https://game-generation-public.web.app/. Code for the behavioural experiment is available via GitHub at https://github.com/guydav/game-creation-behavioral-experiment. Our human evaluation experiment is publicly accessible at https://exps.gureckislab.org/e/expert-caring-chemical/#/welcome. Code for the human evaluation experiment is available via GitHub at https://github.com/guydav/game-fitness-judgements.
References
Dweck, C. S. Article commentary: the study of goals in psychology. Psychol. Sci. 3, 165–167 (1992).
Austin, J. T. & Vancouver, J. B. Goal constructs in psychology: structure, process, and content. Psychol. Bull. 120, 338–375 (1996).
Elliot, A. J. & Fryer, J. W. in Handbook of Motivation Science Vol. 638 (ed. Shah, J. Y.) 235–250 (The Guilford Press, 2008).
Hyland, M. E. Motivational control theory: an integrative framework. J. Pers. Soc. Psychol. 55, 642–651 (1988).
Eccles, J. S. & Wigfield, A. Motivational beliefs, values, and goals. Annu. Rev. Psychol. 53, 109–132 (2002).
Brown, L. V. Psychology of Motivation (Nova Science Publishers, 2007); https://books.google.com/books?id=hzPCuKfpXLMC
Fishbach, A. & Ferguson, M. J. in Social Psychology: Handbook of Basic Principles Vol. 2 (eds Kruglanski, A. W. & Higgins, E. T.) 490–515 (The Guilford Press, 2007).
Pervin, L. A. Goal Concepts in Personality and Social Psychology (Taylor & Francis, 2015); https://books.google.com/books?id=lIXwCQAAQBAJ
Moskowitz, G. B. & Grant, H. The Psychology of Goals Vol. 548 (Guilford Press, 2009).
Molinaro, G. & Collins, A. G. E. A goal-centric outlook on learning. Trends Cogn. Sci. 27, 1150–1164 (2023).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 2018).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Chu, J., Tenenbaum, J. B. & Schulz, L. E. In praise of folly: flexible goals and human cognition. Trends Cogn. Sci. 28, 628–642 (2024).
Chu, J. & Schulz, L. E. Play, curiosity, and cognition. Annu. Rev. Dev. Psychol. 2, 317–343 (2020).
Lillard, A. S. in Handbook of Child Psychology and Developmental Science Vol. 3 (eds Liben, L. & Mueller, U.) 425–468 (Wiley-Blackwell, 2015).
Andersen, M. M., Kiverstein, J., Miller, M. & Roepstorff, A. Play in predictive minds: a cognitive theory of play. Psychol. Rev. 130, 462–479 (2023).
Oudeyer, P.-Y., Kaplan, F. & Hafner, V. V. Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evol. Comput. 11, 265–286 (2007).
Nguyen, C. T. Games: Agency as Art (Oxford Univ. Press, 2020).
Kolve, E. et al. AI2-THOR: an interactive 3D environment for visual AI. Preprint at https://arxiv.org/abs/1712.05474 (2017).
Fodor, J. A. The Language of Thought (Harvard Univ. Press, 1979).
Goodman, N. D., Tenenbaum, J. B., Feldman, J. & Griffiths, T. L. A rational analysis of rule-based concept learning. Cogn. Sci. 32, 108–154 (2008).
Piantadosi, S. T., Tenenbaum, J. B. & Goodman, N. D. Bootstrapping in a language of thought: a formal model of numerical concept learning. Cognition 123, 199–217 (2012).
Rule, J. S., Tenenbaum, J. B. & Piantadosi, S. T. The child as hacker. Trends Cogn. Sci.24, 900–915 (2020).
Wong, L. et al. From word models to world models: translating from natural language to the probabilistic language of thought. Preprint at https://arxiv.org/abs/2306.12672 (2023).
Ghallab, M. et al. PDDL—The Planning Domain Definition Language Tech Report CVC TR-98-003/DCS TR-1165 (Yale Center for Computational Vision and Control, 1998).
Chopra, S., Hadsell, R. & LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 539–546 (IEEE, 2005).
Le-Khac, P. H., Healy, G. & Smeaton, A. F. Contrastive representation learning: a framework and review. IEEE Access 8, 193907–193934 (2020).
Pugh, J. K., Soros, L. B & Stanley, K. O. Quality diversity: a new frontier for evolutionary computation. Front. Robot. AI https://doi.org/10.3389/frobt.2016.00040 (2016).
Chatzilygeroudis, K., Cully, A., Vassiliades, V. & Mouret, J. B. Quality-diversity optimization: a novel branch of stochastic optimization. Springer Optim. Appl. 170, 109–135 (2020).
Mouret, J.-B. & Clune, J. Illuminating search spaces by mapping elites. Preprint at https://arxiv.org/abs/1504.04909 (2015).
Ward, T. B. Structured imagination: the role of category structure in exemplar generation. Cogn. Psychol. 27, 1–40 (1994).
Allen, K. R. et al. Using games to understand the mind. Nat. Hum. Behav. https://doi.org/10.1038/s41562-024-01878-9 (2024).
Liu, M., Zhu, M. & Zhang, W. Goal-conditioned reinforcement learning: problems and solutions. In Proc. 31st International Joint Conference on Artificial Intelligence: Survey Track (ed. De Raedt, L.) 5502–5511 (IJCAI, 2022).
Colas, C., Karch, T., Sigaud, O. & Oudeyer, P.-Y. Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning: a short survey. J. Artif. Intell. Res. 74, 1159–1199 (2022).
Icarte, R. T., Klassen, T. Q., Valenzano, R. & McIlraith, S. A. Reward machines: exploiting reward function structure in reinforcement learning. J. Artif. Intell. Res. 73, 173–208 (2022).
Pell, B. Metagame in Symmetric Chess-Like Games UCAM-CL-TR-277 (Univ. Cambridge, Computer Laboratory, 1992).
Hom, V. & Marks, J. Automatic design of balanced board games. In Proc. AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment Vol. 3 (eds Schaeffer, J. & Mateasvol, M.) 25–30 (AAAI Press, 2007).
Browne, C. & Maire, F. Evolutionary game design. IEEE Trans. Comput. Intell. AI Games 2, 1–16 (IEEE, 2010).
Togelius, J. & Schmidhuber, J. An experiment in automatic game design. In 2008 IEEE Symposium On Computational Intelligence and Games 111–118 (IEEE, 2008).
Smith, A. M., Nelson, M. J. & Mateas, M. Ludocore: a logical game engine for modeling videogames. In Proc. 2010 IEEE Conference on Computational Intelligence and Games 91–98 (IEEE, 2010).
Zook, A. & Riedl, M. Automatic game design via mechanic generation. In Proc. AAAI Conference on Artificial Intelligence Vol. 28, https://doi.org/10.1609/aaai.v28i1.8788 (AAAI Press, 2014).
Khalifa, A., Green, M. C., Perez-Liebana, D. & Togelius, J. General video game rule generation. In 2017 IEEE Conference on Computational Intelligence and Games 170–177 (IEEE, 2017).
Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017).
Cully, A. Autonomous skill discovery with quality-diversity and unsupervised descriptors. In Proc. Genetic and Evolutionary Computation Conference (ed. López-Ibáñez, M.) 81–89 (Association for Computing Machinery, 2019).
Grillotti, L. & Cully, A. Unsupervised behavior discovery with quality-diversity optimization. IEEE Trans. Evol. Comput. 26, 1539–1552 (2022).
Ullman, T. D., Spelke, E., Battaglia, P. & Tenenbaum, J. B. Mind games: game engines as an architecture for intuitive physics. Trends Cogn. Sci. 21, 649–665 (2017).
Chen, T., Allen, K. R., Cheyette, S. J., Tenenbaum, J. & Smith, K. A. ‘Just in time’ representations for mental simulation in intuitive physics. In Proc. Annual Meeting of the Cognitive Science Society Vol. 45 (UC Merced, 2023); https://escholarship.org/uc/item/3hq021qs
Tang, H., Key, D. & Ellis, K. WorldCoder, a model-based LLM agent: building world models by writing code and interacting with the environment. Preprint at https://arxiv.org/abs/2402.12275 (2024).
Reed, S. et al. A generalist agent. Trans. Mach. Learn. Res. 1ikK0kHjvj (2022).
Gallouédec, Q., Beeching, E., Romac, C. & Dellandréa, E. Jack of all trades, master of some, a multi-purpose transformer agent. Preprint at https://arxiv.org/abs/2402.09844 (2024).
Florensa, C., Held, D., Geng, X. & Abbeel, P. Automatic goal generation for reinforcement learning agents. In Proc. 35th International Conference on Machine Learning Vol. 80 (eds Dy, J. & Krause, A.) 1515–1528 (PMLR, 2018).
Open Ended Learning Team et al. Open-ended learning leads to generally capable agents. Preprint at https://arxiv.org/abs/2107.12808 (2021).
Du, Y. et al. Guiding pretraining in reinforcement learning with large language models. In Proc. of the 40th International Conference on Machine Learning (eds Krause, A. et al.) 8657–8677 (JMLR, 2023).
Colas, C., Teodorescu, L., Oudeyer, P.-Y., Yuan, X. & Côté, M.-A. Augmenting autotelic agents with large language models. Preprint at https://arxiv.org/abs/2305.12487v1 (2023).
Littman, M. L. et al. Environment-independent task specifications via GLTL. Preprint at http://arxiv.org/abs/1704.04341 (2017).
Leon, B. G., Shanahan, M. & Belardinelli, F. In a nutshell, the human asked for this: latent goals for following temporal specifications. In 10th International Conference on Learning Representations (OpenReview, 2022); https://openreview.net/forum?id=rUwm9wCjURV
Ma, Y. J. et al. Eureka: Human-Level Reward Design via Coding Large Language Models (ICLR, 2023).
Faldor, M., Zhang, J., Cully, A. & Clune, J. OMNI-EPIC: open-endedness via models of human notions of interestingness with environments programmed in code. In 12th International Conference on Learning Representations (OpenReview, 2024); https://openreview.net/forum?id=AgM3MzT99c
Colas, C. et al. Language as a cognitive tool to imagine goals in curiosity-driven exploration. In Proc. 34th International Conference on Neural Information Processing Systems (NIPS ’20) (eds Larochelle, H. et al.) 3761–3774 (Curran Associates, 2020).
Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D. & Meder, B. Generalization guides human exploration in vast decision spaces. Nat. Hum. Behav. 2, 915–924 (2018).
Ten, A. et al. in The Drive for Knowledge: The Science of Human Information Seeking (eds. Dezza, I. C. et al.) 53–76 (Cambridge Univ. Press, 2022).
Berlyne, D. E. Novelty and curiosity as determinants of exploratory behaviour. Br. J. Psychol. Gen. Sect. 41, 68–80 (1950).
Gopnik, A. Empowerment as causal learning, causal learning as empowerment: a bridge between Bayesian causal hypothesis testing and reinforcement learning. PhilSci-Archive https://philsci-archive.pitt.edu/23268/ (2024).
Addyman, C. & Mareschal, D. Local redundancy governs infants’ spontaneous orienting to visual-temporal sequences. Child Dev. 84, 1137–1144 (2013).
Du, Y. et al. What can AI learn from human exploration? Intrinsically-motivated humans and agents in open-world exploration. In NeurIPS 2023 Workshop: Information-Theoretic Principles in Cognitive Systems (OpenReview, 2023); https://openreview.net/forum?id=aFEZdGL3gn
Ruggeri, A., Stanciu, O., Pelz, M., Gopnik, A. & Schulz, E. Preschoolers search longer when there is more information to be gained. Dev. Sci. 27, e13411 (2024).
Liquin, E. G., Callaway, F. & Lombrozo, T. Developmental change in what elicits curiosity. In Proc. Annual Meeting of the Cognitive Science Society Vol. 43 (UC Merced, 2021); https://escholarship.org/uc/item/43g7m167
Taffoni, F. et al. Development of goal-directed action selection guided by intrinsic motivations: an experiment with children. Exp. Brain Res. 232, 2167–2177 (2014).
Ten, A., Kaushik, P., Oudeyer, P.-Y. & Gottlieb, J. Humans monitor learning progress in curiosity-driven exploration. Nat. Commun. 12, 5972 (2021).
Baldassarre, G. et al. Intrinsic motivations and open-ended development in animals, humans, and robots: an overview. Front. Psychol. 5, 985 (2014).
Spelke, E. S. & Kinzler, K. D. Core knowledge. Dev. Sci. 10, 89–96 (2007).
Jara-Ettinger, J., Gweon, H., Schulz, L. E. & Tenenbaum, J. B. The naïve utility calculus: computational principles underlying commonsense psychology. Trends Cogn. Sci. 20, 589–604 (2016).
Liu, S., Brooks, N. B. & Spelke, E. S. Origins of the concepts cause, cost, and goal in prereaching infants. Proc. Natl Acad. Sci. USA 116, 17747–17752 (2019).
Jara-Ettinger, J. Theory of mind as inverse reinforcement learning. Curr. Opin. Behav. Sci. 29, 105–110 (2019).
Arora, S. & Doshi, P. A survey of inverse reinforcement learning: challenges, methods and progress. Artif. Intell. 297, 103500 (2021).
Baker, C., Saxe, R. & Tenenbaum, J. Bayesian theory of mind: Modeling joint belief–desire attribution. In Proc. Annual Meeting of the Cognitive Science Society Vol. 33 (UC Merced, 2011); https://escholarship.org/uc/item/5rk7z59q
Velez-Ginorio, J., Siegel, M. H., Tenenbaum, J. B. & Jara-Ettinger, J. Interpreting actions by attributing compositional desires. In Proc. Annual Meeting of the Cognitive Science Society Vol. 39 (eds Gunzelmann, G. et al.) (UC Merced, 2017); https://escholarship.org/uc/item/3qw110xj
Ho, M. K. & Griffiths, T. L. Cognitive science as a source of forward and inverse models of human decisions for robotics and control. Annu. Rev. Control Robot. Auton. Syst. 5, 33–53 (2022).
Palan, S. & Schitter, C. Prolific.ac—a subject pool for online experiments. J. Behav. Exp. Finance 17, 22–27 (2018).
Icarte, R. T., Klassen, T., Valenzano, R. & McIlraith, S. Using reward machines for high-level task specification and decomposition in reinforcement learning. In Proc. 35th International Conference on Machine Learning Vol. 80 (eds Dy, J. & Krause, A.) 2107–2116 (PMLR, 2018).
Brants, T., Popat, A. C, Xu, P., Och, F. J. & Dean, J. Large language models in machine translation. In Proc. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (ed. Eisner, J.) 858–867 (Association for Computational Linguistics, 2007).
Rothe, A., Lake, B. M. & Gureckis, T. M. Question asking as program generation. In Advances in Neural Information Processing Systems 30 (eds Von Luxburg, U. et al.) 1047–1056 (Curran Associates, 2017).
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M.’A. & Huang, F. J. in Predicting Structured Data (eds Bakir, G. et al.) Ch. 10 (MIT Press, 2006).
van den Oord, A., Li, Y. & Vinyals, O. Representation learning with contrastive predictive coding. Preprint at https://arxiv.org/abs/1807.03748v2 7 (2018).
Charity, M., Green, M. C., Khalifa, A. & Togelius, J. Mech-elites: illuminating the mechanic space of GVG-AI. In Proc. 15th International Conference on the Foundations of Digital Games (eds Yannakakis, G. N. et al.) 8 (Association for Computing Machinery, 2020).
GPT-4 Technical Report (OpenAI, 2023).
Mann, H. B. & Whitney, D. R. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947).
Castro, S. Fast Krippendorff: fast computation of Krippendorff’s alpha agreement measure. GitHub https://github.com/pln-fing-udelar/fast-krippendorff (2017).
Radenbush, S. W. & Bryk, A. S. Hierarchical Linear Models. Applications and Data Analysis Methods 2nd edn (Sage Publications, 2002).
Hox, J., Moerbeek, M. & van de Schoot, R. Multilevel Analysis (Techniques and Applications) 3rd edn (Routledge, 2018).
Argesti, A. Categorical Data Analysis 3rd edn (Wiley, 2018).
Greene, W. H. & Hensher, D. A. Modeling Ordered Choices: A Primer (Cambridge Univ. Press, 2010).
Christensen, R. H. B. ordinal—regression models for ordinal data. R package version 2023.12-4 https://CRAN.R-project.org/package=ordinal (2023).
R Core Team. R: A Language and Environment for Statistical Computing Version 4.3.2 https://www.R-project.org/ (R Foundation for Statistical Computing, 2023).
Long, J. A. jtools: analysis and presentation of social scientific data. J. Open Source Softw. 9, 6610 (2024).
Lenth, R. V. emmeans: estimated marginal means, aka least-squares means. R package version 1.10.0 https://CRAN.R-project.org/package=emmeans (2024).
Davidson, G., Todd, G., Togelius, J., Gureckis, T. M. & Lake, B. M. guydav/goals-as-reward-producing-programs: release for DOI. Zenodo https://doi.org/10.5281/zenodo.14238893 (2024).
Acknowledgements
G.D. thanks members of the Human and Machine Learning Lab and the Computation and Cognition Lab at NYU for their feedback at various stages of this project. We thank L. Wong for helpful discussions on which questions to prioritize in the human evaluations of our model outputs. We thank O. Timplaru and https://vecteezy.com for the use of the illustration in Fig. 1a. G.D. and B.M.L. are supported by the National Science Foundation (NSF) under NSF Award 1922658. G.T.’s work on this project is supported by the NSF GRFP under grant DGE-2234660. T.M.G.’s work on this project is supported by NSF BCS 2121102.
Author information
Authors and Affiliations
Contributions
G.D. designed and executed the behavioural experiments and analysed their data. G.D. and G.T. jointly designed and implemented the GPG model and designed the human evaluations. G.D. led human evaluation data collection and analysis. G.D.and G.T. led the writing of the paper. J.T. advised on computational modelling and helped write the paper. T.M.G. and B.M.L. jointly advised all work reported in this manuscript and helped write the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Cédric Colas and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Online experiment interface.
The main part of the screen presents the AI2-THOR-based experiment room. Below it, we depict the controls. To the right, we show the text prompts for creating a new game (fonts enlarged for visualization). Our experiment is accessible online here.
Extended Data Fig. 2 Common-sense behavioral analyses.
We plot similar information to Fig. 2b, but including additional object categories and predicates.
Extended Data Fig. 3 Our implementation of the Goal Program Generator model fills the archive quickly and finds examples with human-like fitness scores.
Left: Our model rapidly finds exemplars for all archive cells (that is niches induced by our behavioral characteristics), reaching 50% occupancy after 400 generations (out of a total of 8192) and 95% occupancy after 794 generations—the archive is almost full 1/10th of the way through the search process. Middle: Our model reaches human-like fitness scores. After only three generations, the fittest sample in the archive has a higher fitness score than at least one participant-created game. By the end of the search, the mean fitness in the archive is close to the mean fitness of human games. Right: Our model generates the vast majority of its samples within the range of fitness scores occupied by participant-created games, though few samples approach the top of the range.
Extended Data Fig. 4 Human evaluations interface.
For each game, participants viewed the same four images of the environment, followed by the GPT-4 back-translated description of the game (see Human evaluation methods for details). They then answered the two free-response and seven multiple-choice questions on the right. In the web page version, the questions appeared below the game description; we present them side-by-side to save space.
Extended Data Fig. 5 Mixed model result summary.
We summarize the pairwise comparisons made in Table 1. Each panel corresponds to a set of columns in Table 1 and each color to one of the seven human evaluation attributes we consider. We compare the estimated marginal mean scores under the fitted mixed effect models between each pair of game types listed in the panel title. As in Table 1, we use the method of estimated (least-squares) marginal means to compare the three groups of games, accounting for the random effects fitted to particular games and human evaluators.
Extended Data Fig. 6 Proportion of human interactions activating only matched and real games in the same cell.
Each bar corresponds to a pair of corresponding matched and real games. In each bar, we plot the proportion of relevant interactions (state-action traces) that are unique to the matched game (blue), unique to the real game (green), or shared across both (purple). A few games (with the bar mostly or entirely in purple) show high similarity between the corresponding games — under 25% (7/30) share more than half of their relevant interactions. Most games, however, show substantial differences between the sets of relevant interactions, with some showing a higher fraction unique to human games and others to matched model games. The average Jaccard similarity between the sets of relevant interactions for the matched and real game is only 0.347 and the median similarity is 0.180 (identical games would score 1.0, entirely dissimilar games 0).
Extended Data Fig. 7 Mixed model (including fitness) coefficient summary.
We summarize the fitted model coefficients listed in Extended Data Table 2. Each panel corresponds to a particular coefficient in Extended Data Table 2 and each color to one of the seven human evaluation attributes we consider. We plot the fitted coefficient value and a standard error estimated using the Hessian as implemented in the clmmR package. We observe the same effects discussed in Extended Data Table 2.
Supplementary information
Supplementary Information
A: Supplementary Fig. 1 mapping from pseudo-code to our DSL. B: analyses about mapping from natural language to our DSL. C: a description of our full fitness feature set. D: algorithm 1 describing our fitness function objective. E: a full description of the MAP-Elites algorithm and our behavioural characteristics in Supplementary Table 1. F: a description of our approach to back-translation from our DSL to natural language. G: Our analysis of model-generated sample edit distance from real games, including Supplementary Fig. 2. H: Supplementary Fig. 3 demonstrating the highest-fitness model-generated games. I: detailed descriptions of our human evaluations data analysis, including Supplementary Tables 2–6 and Supplementary Figs. 4–6. J: our model ablations, including Supplementary Figs. 7–9. K: the consent form from our online experiments. L: a full description of our DSL.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Davidson, G., Todd, G., Togelius, J. et al. Goals as reward-producing programs. Nat Mach Intell 7, 205–220 (2025). https://doi.org/10.1038/s42256-025-00981-4
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-00981-4
This article is cited by
-
Beyond Preferences in AI Alignment
Philosophical Studies (2025)