Table 1 SU-D3QN algorithm pseudocode.

Algorithm Dueling Double Deep Q-network
Input state s Output action a 1. Initialize the experience replay buffer D and the network parameter w 2. Set the current state s as the initial state 3. For each turn: Resets the current state s to the initial state 5. For each step in the round: Action a is selected using the epsilon-greedy strategy Perform action a and observe the next state s_t+1 and reward r 8. Store the MDP(s, a, r, s_t+1) into the experience replay buffer D 9. Randomly draw a batch of MDPS from D 10. For each MDP, compute the state value V and the dominance value A for each action 11. Calculate the action value of each action 12. Calculate the target action Q-value using the target network: 13. Compute the gradient descent based on the loss function: 14. Update the target network parameters w_e according to tau 15. Update state s to s_t+1

Quick links

Search