Table 1 SU-D3QN algorithm pseudocode.

From: Research on time series prediction of multi-process based on deep learning

Algorithm Dueling Double Deep Q-network

 

Input state s

Output action a

1. Initialize the experience replay buffer D and the network parameter w

2. Set the current state s as the initial state

3. For each turn:

Resets the current state s to the initial state

5. For each step in the round:

Action a is selected using the epsilon-greedy strategy

Perform action a and observe the next state st+1 and reward r

8. Store the MDP(s, a, r, st+1) into the experience replay buffer D

9. Randomly draw a batch of MDPS from D

10. For each MDP, compute the state value V and the dominance value A for each action

11. Calculate the action value of each action

12. Calculate the target action Q-value using the target network:

13. Compute the gradient descent based on the loss function:

14. Update the target network parameters we according to tau

15. Update state s to st+1