Fig. 3
From: Human adaptation to adaptive machines converges to game-theoretic equilibria

Gradient descent in policy space (Experiment 3, \(n = 20\)). Experimental setup and costs are the same as Fig. 1A,B except that the machine uses a different adaptation algorithm: in this experiment, M iteratively implements linear policies \(m = L_M h\), \(m = (L_M + \Delta )h\) to measure the gradient of its cost with respect to its policy slope parameter \(L_M\) and updates this parameter to descend its cost landscape. (A) Median actions and policies for each policy gradient iteration k overlaid on game-theoretic equilibria corresponding to machine best-responses (BR) at initial and limiting iterations (BR\(\phantom{0}_0\) and BR\(\phantom{0}_\infty\), respectively) predicted from the Stackelberg equilibrium (SE) and the machine’s global optimum (RSE), respectively. (B) Action distributions for each iteration displayed by box-and-whiskers plots as in Fig. 1D. (C) Policy slope distributions for each iteration displayed with the same conventions as B; note that the sign of the top subplot’s y-axis is reversed for consistency with other plots. (D) Cost distributions for each iteration displayed using box-and-whiskers plots as in Figs. 1E and 2D. Statistical significance (\(*\)) determined by comparing action distribution at iteration \(k=9\) to SE and RSE using Hotelling’s \(T^2\) test (\(P < 0.001\) comparing to SE and \(P = 0.11\) comparing to RSE). (E) Error between measured and theoretically-predicted policy slopes at each iteration displayed as box-and-whiskers plots as in B,C. (F,G) One- and two-dimensional histograms of actions for different iterations (\(k=0\) in F, \(k=9\) in G) with policies and game-theoretic equilibria overlaid (SE in F, RSE in G).