Table 2 Hyperparameter selection for the proposed deep reinforcement learning.
Category | Parameter | Description | Value |
|---|---|---|---|
Q-learning | discount factor (γ) | Discount rate for Q-value (reward) of future values (0 ≤ γ ≤ 1) | 0.85 |
learning rate (α) | The proportion of new learning results reflected (0 ≤ α ≤ 1) | 0.001 | |
epsilon (εQ) | The ratio of exploring various actions, decreasing from the initial value to the final value | 1.0 → 0.2 | |
epsilon decay (\(\:\lambda\:)\) | Gradual reduction of the exploration rate over time to stable learning | 0.99 | |
Target network update | Frequency of updating the target network with episodes | 1 | |
Experience replay | batch size | Amount of data used for learning at once | 64 |
memory size | Size of the memory that stores learning data | 2000 | |
Network configuration | Input layer | Number of input nodes, i.e., state variables | 3 |
Hidden layer | Number of nodes in the hidden layer, adjusting the complexity of the model | 32-32 | |
Output layer | Number of selectable actions | 7 | |
Loss | Loss function, Mean Squared Error (MSE) | ||
Optimizer | Optimization algorithm, Adam |