Extended Data Fig. 5: Mixed model result summary.

We summarize the pairwise comparisons made in Table 1. Each panel corresponds to a set of columns in Table 1 and each color to one of the seven human evaluation attributes we consider. We compare the estimated marginal mean scores under the fitted mixed effect models between each pair of game types listed in the panel title. As in Table 1, we use the method of estimated (least-squares) marginal means to compare the three groups of games, accounting for the random effects fitted to particular games and human evaluators.