Fig. 2: Compositional state representations generalize and can be built through latent learning. | Nature Neuroscience

Fig. 2: Compositional state representations generalize and can be built through latent learning.

From: Constructing future behavior in the hippocampal formation through composition and replay

Fig. 2

a, We learn a mapping, f, from state representation, s, to optimal action, a (white arrow)—f(s) = a. For a given location, s represents the absolute location in the environment (traditional; red, s = (x)) or the relative vector codes (black arrows) for all objects (walls and rewards) in the environment (compositional; blue, s = (w, r)). b, Policy accuracy at test locations for traditional (red) and compositional (blue) representations in simple reward-only (top) and complex multiwall (bottom) discrete environments. In discrete graph environments, we sample (state representation and optimal action (blue triangles)) training examples, both in simple environments with reward only and in complex environments with multiple walls. When trained on a single environment and tested on the same environment, both the traditional and compositional state representations provide accurate policies (panels 1 and 2). Only mappings from compositional state representations yield accurate policies when tested in a new environment (panels 3 and 4). n = 25, error bars = s.e.m. c, Policy accuracy at test locations for traditional (red) and compositional (blue) representations in simple reward-only (top) and complex multiwall (bottom) continuous environments. In continuous environments, where locations are continuous coordinates and actions are continuous directions, we find the same results. Either representation works within environments (panels 1 and 2), but only the compositional state representation generalizes (panels 3 and 4). n = 25, error bars = s.e.m. d, How to obtain this compositional state representation in new environments? The vector code (black arrow) that makes up these representations path-integrates. Because it follows relational rules independent of the environment, the representation can be updated with respect to the agent’s action (white arrow). For example, if a reward is to the east and the agent goes north, the reward is now to the southeast. e, Path integration allows for serially incorporating objects (or walls or rewards) into the compositional map. The object-vector code is initialized on object discovery and then carried along as the agent explores the environment. f, Policy accuracy at states behind the wall with respect to reward as a function of the number of visits to those states, when learning structure from the start (blue) or after finding reward (orange). Our agent thus learns about the structure of the environment without being rewarded. Once they find a reward, this latent learning allows access to optimal actions on the first visit to locations behind the wall (blue). Without latent learning, the agent needs to rediscover the wall to obtain the optimal policy behind the wall (orange). Error bars: s.e.m. Icons in panels a, d–f were adapted from Twemoji, under a CC BY 4.0 license.

Back to article page