Extended Data Fig. 5: The Structured Memory buffers model predicts behavioural sequences.

a) As well as rings tracking task-progress from behavioural steps involving a rewarded place (a conjunction of a place with early goal-progress), there are also rings tracking task-progress from places conjoined with intermediate and late goal-progress. The anchors of these rings are activated when the animal passes through a location, not when it is rewarded, but at a defined, non-zero progress percentage relative to the upcoming goal. b) Non-zero goal-progress anchored rings (e.g. purple outline) allow tracking task-progress from behavioural steps in between two goals. Hence, across all rings, a history of the entire sequence of steps taken by the animal, not just the sequence of reward locations, is encoded at any one point in time. c) Schematic showing distal prediction of an animal’s choices from memory buffers. When the animal visits a goal-progress/place (t = 1) in trial N, a bump of activity is initiated in the memory buffer that is anchored to this goal-progress/place. The anchor is location 2 at intermediate goal progress (brown) in the top memory buffer, and location 4 at intermediate goal progress (red) in the bottom memory buffer. This bump travels around the buffer (e.g. t = 2), paced by progress in the task. When the activity bump circles back to a point close to the anchor (t = 3), it can be read out to bias the animal to return back to the same goal-progress/place in trial N + 1 that was visited in the same task state in trial N. This read-out time defines a “decision point” that is specific for each memory buffer. Left: If, at t = 3 in the example given, the bump on the buffer anchored to intermediate goal-progress in location 2 (brown square) is larger than that for the other option (intermediate goal-progress in location 4; red square) the animal will choose location 2. Right: Location 4 (red square) is chosen if the bump anchored to intermediate goal-progress in location 4 is larger at t = 3. This choice could have been predicted from the bump sizes at an earlier time point (e.g. t = 2) as the bump size will remain highly stable for the duration of a single trial, hence allowing distal prediction of choices from the memory buffers. Reproduced/adapted with permission from Gil Costa.