Fig. 2: Phy-Q task templates and evaluation settings.
From: Phy-Q as a measure for physical reasoning intelligence

a, The task templates of the relative height scenario (first row) and the tasks generated using the second task template in the first row (second row). b, The local generalization and the broad generalization evaluation settings. c, An illustration of how generalizing a physical rule is evaluated in the broad generalization evaluation using the bouncing scenario as an example.