Table 1 RL hyperparameters used in the SAC agent.
Hyperparameter | Symbol | Value |
|---|---|---|
Algorithm | – | SAC |
Learning rate | \(\eta\) | \(3\times 10^{-4}\) |
Discount factor | \(\gamma\) | 0.99 |
Replay buffer size | \(|\mathscr {D}|\) | \(1.0\times 10^{6}\) |
Batch size | B | 256 |
Target smoothing coef. | \(\tau\) | \(5\times 10^{-3}\) |
Entropy temperature | \(\alpha\) | Auto-tuned |
Target entropy | \(H_{\text {tgt}}\) | \(-\dim (\mathscr {A})\) |
Learning starts | – | \(10^4\) steps |
Training steps | \(N_{\text {train}}\) | \(2\times 10^{5}\) |
Network architecture | – | (256, 256) MLP |
Optimizer | – | Adam |