Fig. 1: Comparison between free energy and robust free energy for policy computation. | Nature Communications

Fig. 1: Comparison between free energy and robust free energy for policy computation.

From: Distributionally robust free energy principle for decision-making

Fig. 1

a A robotic agent navigating a stochastic environment to reach a destination while avoiding obstacles. At a given time-step, k − 1, the agent determines an action Uk from a policy using a model of the environment (e.g., available at training via a simulator possibly updated via real world data) and observations/beliefs (grouped in the state Xk−1). The environment and model can change over time. Capital letters are random variables, lower-case letters are realizations. b The trained model and the agent environment differ. This mismatch is a training/environment (model) ambiguity: for a state/action pair, the ambiguity set is the set of all possible environments that have statistical complexity from the trained model of at most \({\eta }_{k}({{{{\bf{x}}}}}_{k-1},{{{{\bf{u}}}}}_{k})\). We use the wording trained model in a very broad sense. A trained model is any model available to the agent offline: for example, this could be a model obtained from a simulator or, for natural agents, this could be hardwired into evolutionary processes or even determined by prior beliefs. c A free energy minimizing agent in an environment matching its own model. The agent determines an action by sampling from the policy \({\pi }_{k}^{\star }({{{{\bf{u}}}}}_{k}| {{{{\bf{x}}}}}_{k-1})\). Given the model, the policy is obtained by minimizing the variational free energy: the sum of a statistical complexity (with respect to a generative model, q0:N) and expected loss (state/action costs, \({c}_{k}^{(x)}({{{{\bf{x}}}}}_{k})\) and \({c}_{k}^{(u)}({{{{\bf{u}}}}}_{k})\)) terms. d DR-FREE extends the free energy principle to account for model ambiguities. According to DR-FREE, the maximum free energy across all environments—in an ambiguity set—is minimized to identify a robust policy. This amounts to variational policy optimization under the epistemic uncertainty engendered by ambiguous environment.

Back to article page