Table 1 Statistics of individual task success rate with easy-to-hard task ordering

From: Preserving and combining knowledge in robotic lifelong reinforcement learning

Train after

Evaluation

Lifelong learning

Multi-task

 

Reach

Push

Pick place

Door open

Faucet open

Drawer close

Button press

Peg unplug

Window open

Window close

Forgetting

Forward transfer

 

Reach

1.00

1.00

0.80

0.80

0.80

1.00

1.00

1.00

1.00

1.00

0.00

NA

1.00

Push

0.00

1.00

1.00

1.00

1.00

1.00

1.00

0.80

0.80

0.80

0.20

0.00

0.80

Pick–place

0.00

0.00

0.80

1.00

0.80

0.80

1.00

0.60

1.00

0.80

0.00

0.00

0.80

Door open

0.00

0.00

0.00

0.40

0.80

0.80

0.60

0.40

0.80

0.60

−0.20

0.00

1.00

Faucet open

0.00

0.00

0.00

0.00

1.00

1.00

1.00

1.00

1.00

1.00

0.00

0.00

1.00

Drawer close

0.00

1.00

0.80

0.80

0.00

0.60

0.80

0.80

1.00

1.00

−0.40

0.52

1.00

Button press

0.00

0.00

0.00

0.00

0.40

0.00

0.80

0.60

0.80

0.60

0.20

0.07

1.00

Peg unplug

0.00

0.00

0.00

0.00

0.00

0.20

0.00

1.00

0.60

0.60

0.40

0.03

0.80

Window open

0.00

0.00

0.40

0.00

0.60

0.00

0.20

0.00

0.80

1.00

−0.20

0.15

1.00

Window close

0.00

0.00

0.00

0.00

0.00

0.40

0.00

0.40

0.40

1.00

NA

0.13

1.00

Average

0.10

0.30

0.38

0.40

0.54

0.58

0.64

0.66

0.82

0.84

0.00

0.10

0.94

  1. In LRL, we assess the performance of all tasks (row-wise) once the agent completes training on each one-time feeding task (column-wise). In multi-task reinforcement learning, the agent is evaluated after simultaneous training on all tasks (row-wise). Each datum is based on at least five trials, with average values reported for evaluation. The metrics ‘forgetting’ and ‘forward transfer’ are used to assess the specific characteristics of the LRL agent. ‘Forgetting’, in the range [−1, 1] (equation (2)), measures the extent of knowledge retention, with lower values indicating better performance. ‘Forward transfer’, in the range [0, 1] (equation (3)), evaluates how well earlier task knowledge supports subsequent tasks, where higher values denote better performance. NA, not available.