Fig. 1: A hierarchical framework for locomotor adaptation.
From: Exploration-based learning of a stabilizing controller predicts locomotor adaptation

a Humans are able to adapt readily to numerous locomotor task settings, both familiar and novel. b Description of the proposed hierarchical learning framework, containing three components: (i) the inner loop, represents a fast timescale response due to the stabilizing feedback controller (blue), aimed at avoiding falling; (ii) an outer loop, represents reinforcement learning (red) that tunes the parameters of the inner loop controller to improve some performance objective; (iii) storing and using memories of the learned controllers (green). Alternative adaptation mechanisms may include different performance objectives within the same framework (energy, symmetry, task error) or may replace the feedback controller by a sensorimotor transformation with a state estimator followed by the controller. These components act on the physics-based model of the human (Supplementary Fig. 1), allowing it to respond to perturbations and continuously adapt to new situations. c Reinforcement learning by mining exploratory noise to estimate gradient and improve the controller. Initially, the controller parameters p1 and p2 are near the optimum of the initial performance landscape (blue). When conditions change, the performance contours change (blue to orange) as does the optimum. Exploratory noise in the controller parameters, allows the learner to estimate the gradient of the performance objective and follow the negative of this gradient to improve performance. d Memory takes in task parameters and returns the stored controller parameters pmemory and the associated performance value Jmemory. We describe how memory is used in concert with gradient-based learning. The control parameters pi are updated toward memory pmemory when doing so improves performance (memory use); memory is updated toward the current parameters otherwise. Updates toward memory is degraded if these updates are not aligned with the gradient, and this degradation is mediated by a modified cosine tuning.