Table 1 Hyperparemeters configuration of the M2ACD.

From: Deep reinforcement learning trajectory planning for robotic manipulator based on simulation-efficient training

Parameter

Value

Experience replay buffer

1000

Batch

16

Learning rate

\(1\times 10^{-2}\)

Network update rate

100

Exploration rate

\(1\times 10^{-2}\)

Epsilon decay

10