Table 2 Hyperparameter selection for the proposed deep reinforcement learning.

Category	Parameter	Description	Value
Q-learning	discount factor (γ)	Discount rate for Q-value (reward) of future values (0 ≤ γ ≤ 1)	0.85
	learning rate (α)	The proportion of new learning results reflected (0 ≤ α ≤ 1)	0.001
	epsilon (ε_Q)	The ratio of exploring various actions, decreasing from the initial value to the final value	1.0 → 0.2
	epsilon decay (\(\:\lambda\:)\)	Gradual reduction of the exploration rate over time to stable learning	0.99
	Target network update	Frequency of updating the target network with episodes	1
Experience replay	batch size	Amount of data used for learning at once	64
Experience replay	memory size	Size of the memory that stores learning data	2000
Network configuration	Input layer	Number of input nodes, i.e., state variables	3
	Hidden layer	Number of nodes in the hidden layer, adjusting the complexity of the model	32-32
	Output layer	Number of selectable actions	7
	Loss	Loss function, Mean Squared Error (MSE)
	Optimizer	Optimization algorithm, Adam

Quick links

Search