Fig. 1: Training artificial agents to track dynamic odour plumes with DRL.
From: Emergent behaviour and neural dynamics in artificial agents tracking odour plumes

a, A schematic of a flying insect performing a plume tracking task, showing upwind surge, cross-wind cast and U-turn behaviours. In this work, we model the spatial scale (dashed rectangle) where the insect can use only olfactory and mechanosensory wind sensing cues for plume tracking. b, The plume simulator models stochastic emission of odour packets from a source carried by wind. Odour packets are subject to advection by wind, random cross-wind perturbation and radial diffusion. c, An example of a plume simulation where the wind direction changed several times. The centreline of the plume is in red. d, A schematic of how the artificial agent interacts with the environment at each time step. The plume simulator model of the environment determines the sensory information x (egocentric wind-direction vector and local odour concentration) available to the agent and the rewards used in training. The agent navigates within the environment with actions a (turn direction and magnitude of movement). e, Agents are modelled as neural networks and trained by DRL. An RNN generates an internal state representation h from sensory observations, followed by parallel actor and critic heads that implement the agent’s control policy and predict the state values, respectively. The actor and critic heads are two-layer, feedforward MLP networks. f, A schematic to illustrate an agent’s head direction and course direction and the wind direction, all measured with respect to the ground and anticlockwise from the x axis. Course direction is the direction in which the agent actually moves, accounting for the effect of the wind on the agent’s intended direction of movement (head direction). Egocentric wind direction is the direction of the wind as sensed by the agent. Panels a,f adapted with permission from ref. 98 under a Creative Commons licence CC BY 4.0. Panel a inspired by a figure in Baker et al.3.