Table 4 experimental parameter settings.
Parameter | Value |
|---|---|
Number of agents | 10 |
State space dimensions | 4 |
Action space dimensions | 2 |
Number of training rounds | 500 |
Steps per round | 50 |
Exploration rate | Initial value \(\varepsilon\): 0.8, decreases by 0.01 per round, minimum \({\varepsilon }_{min}\): 0.1 |
Reward discount factor | 0.9 |
Learning rate | 0.001 |
Neural network structure | Actor: two hidden layers, 64 neurons each; Critic: two hidden layers, 128 neurons each |
Optimizer | Adam |
Loss function | mean square error |
Training batch size | 64 |
The initial state of the agent | Randomly generated within the state space |
Interference noise | Gaussian white noise with mean 0 and standard deviation 0.1 |