Table 4 Learning and algorithm hyperparameters.
Parameter | Value |
|---|---|
Discount factor (\(\gamma\)) | 0.9 |
Energy coefficient (\(\kappa\)) | \(10^{-28}\) |
Learning rate | 0.001 |
Batch size | 16 |
Optimizer | RMSProp |
Exploration rate (\(\epsilon\)) | 1 \(\rightarrow\) 0.01 (decay) |
Replay buffer | Per-device |
Episodes | 1000 |
Time slots per episode | 100 |