Fig. 1: The deep reinforcement learning (DRL) architecture. | Communications Physics

Fig. 1: The deep reinforcement learning (DRL) architecture.

From: Quantum compiling by deep reinforcement learning

Fig. 1

a The DRL environment can be described as a quantum circuit modeled by the approximating sequence Un, the fixed tolerance ε, and the unitary target to approximate \({{{{{{{\mathcal{U}}}}}}}}\), that generally changes at each episode. At each time step n, the agent receives the current observation On and based on that information, it chooses from the base \({{{{{{{\mathcal{B}}}}}}}}\) the next gate an to apply on the quantum circuit. Therefore, the environment returns the real-valued reward rn to the agent, which is a function of the state Sn and the action an. b The policy π of the agent is encoded in a deep neural network (DNN). The policy of the agent is encoded in a deep neural network. At each time-step, the DNN receives as input a vector made by the real and imaginary parts of the observation On. Such information is processed by the hidden layers and returned through the output layer. The neurons in the output layer are associated with the action the agent will perform in the next time step. In the bottom-right corner is reported an example of the nonlinear activation function, i.e., the rectified linear unit function RELU.

Back to article page