Extended Data Fig. 1: t-SNE projections of buffer data inference results after each base task training. | Nature Machine Intelligence

Extended Data Fig. 1: t-SNE projections of buffer data inference results after each base task training.

From: Preserving and combining knowledge in robotic lifelong reinforcement learning

Extended Data Fig. 1

The data is randomly sampled from the buffer and fed into the task encoder to do the inference. In the buffer we reserve a place for only three tasks, the new incoming inputs will overwrite the earliest data in the buffer. We use this method to force the agent to pause on the corresponding task for a period of time and evaluate its few-shot performance in the subsequent loops (few-shot revisit and knowledge recall). The DPMM dynamically adjusts its knowledge clustering components using the ‘birth’ and ‘merge’ heuristics, fitting model parameters based on observed data. This approach eliminates the need to predetermine or set any assumptions about the number of tasks the agent may encounter.

Back to article page