Extended Data Table 1 Non-parametric significance test results mostly corroborate mixed-model results

From: Goals as reward-producing programs

  1. Fitness scores show (statistically) significant positive effects on the understandability, fun to play, and human-likeness attributes, and significant negative effects on the helpfulness, difficulty and creativity questions. Accounting for the role of fitness, the matched group membership shows significant effects only the fun to play and watch, helpfulness, and human likeness questions. The real group shows significant effects on understandability, fun to play and watch, and human likeness. Standard errors were estimated using the Hessian as part of model-fitting. We report coefficient significance estimates using the two-sided Wald test. *: P < 0.05, **: P < 0.01, ***: P < 0.001 †: The full measure description is ‘Helpful for interacting with the simulated environment.’ In most measures, higher scores are better, indicated by the , other than Difficult , in which 3 means ‘appropriately difficult’, and scores below and above indicate too easy and too hard respectively.