Table 1 SU-D3QN algorithm pseudocode.
From: Research on time series prediction of multi-process based on deep learning
Algorithm Dueling Double Deep Q-network | |
|---|---|
Input state s Output action a 1. Initialize the experience replay buffer D and the network parameter w 2. Set the current state s as the initial state 3. For each turn: Resets the current state s to the initial state 5. For each step in the round: Action a is selected using the epsilon-greedy strategy Perform action a and observe the next state st+1 and reward r 8. Store the MDP(s, a, r, st+1) into the experience replay buffer D 9. Randomly draw a batch of MDPS from D 10. For each MDP, compute the state value V and the dominance value A for each action 11. Calculate the action value of each action 12. Calculate the target action Q-value using the target network: 13. Compute the gradient descent based on the loss function: 14. Update the target network parameters we according to tau 15. Update state s to st+1 |