Table 5 Reinforcement learning network training hyperparameters.
Number of layer segments | Single-layer segment | Two-layer segment | Three-layer segment |
---|---|---|---|
Actor E-learning rate | \(5{e^{ - 3}}\) | \(3{e^{ - 3}}\) | \(7{e^{ - 3}}\) |
Critic E-learning rate | \(5{e^{ - 3}}\) | \(5{e^{ - 3}}\) | \(7{e^{ - 3}}\) |
Alpha E-learning rate | \(3{e^{ - 5}}\) | \(3{e^{ - 10}}\) | \(3{e^{ - 15}}\) |
Experience playback buffer size | 10,000 | 20,000 | 50,000 |
sample size | 64 | 256 | 512 |
soft update parameter | \(5{e^{ - 3}}\) | \(5{e^{ - 3}}\) | \(5{e^{ - 3}}\) |
Number of training cycles | 500 | 500 | 500 |
Maximum number of steps per round | 200 | 200 | 200 |
discount factor | 0.9 | 0.9 | 0.9 |